1. Querying our data
Now that we have connected some files to our DataLab, it's time to use SQL.
2. The dataset
Our examples will use a modified synthetic sales dataset that can be obtained from Kaggle.
It contains simulations of sales transactions, where each row or record is an individual sale. These records include information on the product, region, sales representative, pricing, and more. Not all are shown here. There are no null values in this dataset.
3. Sample query
We've seen that when uploading a file in a new workbook, a cell will automatically generate within the notebook to help us get started. Let's modify this query to start exploring our data.
4. Querying our data
We'll begin by counting the number of records in this dataset.
Recall that we'll need to use the file name and extension in the FROM clause. The rest of the query will look familiar.
We can hit run or use the keyboard shortcut control plus enter to run the current code cell. Our output is generated directly under the query.
5. Querying our data
Here's what that looks like on DataLab when clicking the run button next to the cell.
6. SQL cell headers
A SQL cell will always have a header. The first dropdown is the data source, in this case, our CSV file.
We also have the option here to work with some sample datasets, or add another new data source or file.
7. SQL cell headers
The second dropdown gives us two options on how the query should run: DataFrame mode or Query mode. Let's dig into these to understand the differences.
8. Modes
DataFrame mode will return the complete query output and store it as a pandas DataFrame with the name specified in the field provided. Here we've named it df_name.
Query mode is not available for DataFrames and CSVs, but it is available for other data sources like Google Sheets. It returns only a preview of the query output and saves the query to the specified name. Query mode can be helpful if our query takes too long as it reduces the computational resources needed to generate a lengthy output.
In fact, chaining a few queries together is similar to using common table expressions, which is something you'll learn about as you develop your SQL skills.
9. Query mode with a sample dataset
Let's see how to use query mode with a sample dataset.
We create a new SQL cell, select the option to use a sample dataset, filter for PostgreSQL datasets and select an option. Then we can change the mode, and define a name for our query before running the cell.
10. Let's practice!
Naming and storing our query or results will become useful in the following video, so stay tuned. For now, let's practice.