1. Manipulating data using SQL syntax
You don't necessarily need to be a die-hard SQL fan to appreciate that sometimes data manipulation tasks are just easier with SQL. csvkit has a command csvsql that allows us to apply SQL-like syntax to local CSV files, without needing to set up a local database.
2. csvsql: documentation
csvsql is a command in the csvkit library that allows us to apply SQL-like statements to one or more locally saved CSV files.
Under the hood, this works because csvsql creates an in-memory SQL database that temporarily hosts the entire CSV file you are processing.
This means that, while convenient, this command is not suitable for large file processing or complex SQL querying.
For full documentation, use csvsql dash-h.
3. csvsql: applying SQL to a local CSV file
Let's set up an example where we have a CSV file Spotify underscore music attributes dot csv in our current directory.
4. csvsql: applying SQL to a local CSV file
If we wish to print only the first row in this CSV, one of many ways we can do so is by applying SQL syntax to Spotify music attributes via the csvsql command.
csvsql takes in the SQL query by first, using the dash-dash-query option, followed by the SQL query in quotation marks, as a string. The SQL query is written as though Spotify music attributes is a table that has been loaded into a database.
Please note, that it is crucial to follow the SQL query with the exact location of the data file we are referencing, relative to the current file directory. Otherwise, csvsql will not know where to look for the data file.
To put everything together, the code reads: csvsql dash dash query quotation marks select star from Spotify underscore music attributes limit 1 end quotation marks Spotify underscore music attributes dot csv.
This prints the first row of the data stored in the Spotify music attributes CSV file.
5. csvsql: applying SQL to a local CSV file
If we want a prettier printout, we can always pipe the results of our csvsql query to csvlook by attaching pipe-csvlook at the end of our command. Now, it's much easier to see which data point belongs to which column!
6. csvsql: applying SQL to a local CSV file
Lastly, if we want to save the results to a new file, we can use the re-direct operator to save the output to one-song-file-dot-csv.
When the command executes, there is nothing printed to console. For a sanity check, we use ls to confirm that the file is created correctly.
7. csvsql: joining CSVs using SQL syntax
csvsql can even be applied to more than 1 csv file at a time! As seen here, you can even join two CSV files file-a and file-b using SQL join syntax.
A few key notes are that one, the SQL query, no matter how long or complex, must be written in one line with no line breaks. Sadly, because of limitations to our slide size, the rest of the query is abbreviated using dot-dot-dot.
Secondly, any files that are mentioned in the query must be noted after the query ends, in the order of the file appearance in the SQL query. In this case, we referenced file-a first in our join then file-b, so the order is file underscore a dot csv space file underscore b dot csv
8. Let's practice!
Let's get some practice in, now that we can use SQL syntax even outside of a database!