1. Creating a coffee data layer
In this scenario, I’m continuing in my role as a data practitioner for a global coffee retailer. My task is to refine various tables from different parts of the business to ensure that downstream data teams can perform their analytics efficiently.
One of our data analysts has requested a dataset that combines sales, store, and product information. Since they’re not familiar with data engineering processes, they prefer having all this information consolidated into a single table for ease of use.
I’ll begin by reviewing the store data, which will serve as the foundation for this dataset. The store table contains basic details about each location, but to enrich it, I’ll incorporate sales data. Using the Catalog Explorer in the SQL Editor, I see that both the store and sales tables share a common store_id column, which I can use as a join key.
To merge these datasets, I’ll perform a LEFT JOIN, ensuring that all sales data for each store is included. I’ll bring in the relevant columns from the sales table: product_id, date, sales, price, and revenue.
Next, I’ll join the product data. Both the sales and product tables have a product_id column, so I’ll use this as the key for another LEFT JOIN. This will allow me to pull in the first three levels of the product hierarchy, which indicate the product categories.
Once the data is merged, I’ll apply some final touches, such as formatting monetary values correctly and rounding the sale quantities down to whole numbers for consistency.
To wrap up, I’ll use a CREATE TABLE AS (CTAS) clause to save the result as a new table called store_sales. This table will provide the analyst with a comprehensive view, combining all the necessary data for their analysis.
In the next exercises, you’ll practice these same techniques to transform your datasets into an analytics-ready data model.
2. Let's practice!