1. Pulling data from databases
Welcome back! In this chapter, we will learn how the csvkit library brings SQL-like functionalities to the command line. First, let's see how we can pull data from databases using the command sql2csv.
2. sql2csv: documentation
sql2csv is a command in the csvkit library that allows us to access a variety of popular SQL databases and dialects, including Microsoft SQL Server, MySQL, Oracle, PostgreSQL, and Sqlite. The beauty of this is that sql2csv gives us access via the command line without having to go through database clients like PgAdmin or TablePlus.
Like the name suggests, sql2csv executes SQL commands, pulls data from the database, and saves the results as a local CSV file.
For full documentation, use sql2csv dash-h.
3. sql2csv: querying against the database
The syntax for sql2csv is fairly involved, so let's take this step by step.
First, the entirety of the code block here is one command. Because the syntax is so long, for readability purposes, we need to break it into several lines by using the back-slash at the end of each line. The back-slash tells our terminal that this particular command hasn't finished yet and needs to continue on in the next line.
While you might not need to do this if you are typing directly into the terminal, this is helpful to know if you are saving your commands in, say, a shell script. Each language has its own styling guide for how long a line of code can be. For shell scripts, the limit is 80 characters per line.
sql2csv needs us to tell it how to connect to the database. This is done using the dash-dash-d-b option, followed by a string, in quotation marks, denoting the database connection and location.
Because sql2csv is compatible with multiple types of SQL databases, the syntax also varies for each type. For example, for SQLite databases, the string starts with sqlite colon three forward slashes, followed by the database name, in this case Spotify-database, and then ends with dot-d-b, short for database.
Had this been a Postgres or MySQL database, then the string would have started with postgres colon three forward slashes or mysql colon three forward-slashes, but there would be no need for the ending database extension.
4. sql2csv: querying against the database
The second part of this command is the dash-dash-query option. Here, we pass in the SQL query to be used to pull data from the tables stored in the database. Like with the previous argument, the SQL query is also a string so it also needs to be in quotation marks.
For our sample syntax, the Spotify-database has a table called Spotify underscore popularity. To pull all the data from this table, we pass in the query quotation marks select star from Spotify underscore popularity end quotation marks.
Two things worth noting. One, different databases have slightly different variations on SQL syntax. Always use the SQL syntax that is compatible with the database specified.
Two, the query string always needs to be written in one single line, no matter how long or complex the query. Otherwise, it would lead to unexpected errors.
5. sql2csv: querying against the database
Lastly, if we do not re-direct the output of our SQL query, then sql2csv will simply print all the query results in the console log. The re-direct sign takes our query results and saves it to a new CSV file in our current directory.
Here, we save the output to Spotify underscore popularity dot csv.
6. Let's practice!
That's a lot going on with just one sql2csv command! Let's put our new knowledge to practice!