Updating coffee sales data

1. Updating coffee sales data

In this example, I will again continue my role as a data engineer at a global coffee retailing company. I am responsible for maintaining all of our sales-related datasets, and constantly receive new data from our various data sources. I will begin by setting my notebook to run from my centralized data schema. Doing so makes it easier to construct my queries, as all of my data resides in the same schema. I will next take a look at some new sales data that I have received. I have previously created a new table called sales_new from the original CSV file. Looking at the data, I can see that these are all net-new records and can be simply appended to my sales table. In Databricks SQL, I can accomplish this with the INSERT INTO statement. Since my datasets are both stored in Unity Catlog tables, this becomes an easy process, where I just put my target and source tables into the query. When I query my final dataset, I can now see that I have the most recent date of sales records in my table. Next, I can start to work with some of my product data. My product dataset is a fairly stagnant dataset, as our company has had the same products for quite some time. We do, however, periodically update how we refer to these products internally. In this products_updated table, we can see that every product has new values for the hierarchy columns, as we have re-categorized every product we have. Since the records are updated versions of previous rows, I want to use the MERGE INTO syntax in Databricks SQL. I will again specify my target and source tables, and will indicate that the product_id column is my key to match rows together on. In the case where there is a match between the data rows, I will UPDATE the entire dataset. Now, when I query the products table, I can see that my table reflects the latest categories. By using these two strategies, I am confident that I can address any kind of data ingestion that I might run into. Both of these strategies can apply to when the table structure stays the same, and even when it changes schema. Now, let us go practice these techniques with our insurance datasets.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

This exercise is part of the course

Introduction to Databricks SQL

IntermediateSkill Level

4.7+

Start Course for Free

Build a strong data warehousing foundation with the Databricks Data Intelligence Platform! You will learn how Databricks and the lakehouse architecture set your organization up for a modern SQL and BI stack. You will also explore the key components of the Databricks SQL product, ranging from data storage techniques to compute optimizations.

Exercise 1: SQL in the Data Intelligence Platform Exercise 2: Benefits of the lakehouse for SQL Exercise 3: Exploring Databricks SQL Exercise 4: Exploring some data Exercise 5: Partner Connect Exercise 6: Databricks SQL key assets Exercise 7: Classifying Databricks SQL assets Exercise 8: Create a query Exercise 9: Build a visualization

Build a strong data foundation for the lakehouse architecture! You will learn how to ingest, transform, and model your data using the capabilities in Databricks SQL. You will explore GUI-based and programmatic approaches to building out the medallion architecture, and creating a data ecosystem ready for any analytics workload.

Exercise 1: Ingesting Data Exercise 2: GUI-based data ingest Exercise 3: Hydrating the lakehouse Exercise 4: Upload data manually Exercise 5: Using COPY INTO Exercise 6: Leveraging Auto Loader Exercise 7: Transforming data Exercise 8: SQL and the medallion architecture Exercise 9: Creating a coffee data layer Exercise 10: Cleaning up raw tables Exercise 11: Creating the silver layer Exercise 12: Creating a table of large premium claims

Create powerful data insights using Databricks SQL! You will learn how to write queries, create visualizations, and power dashboards using in-platform Databricks capabilities. You will practice leveraging SQL code, filters, and parameters to create a robust analytical application for your end users.

Exercise 1: Querying in the Data Intelligence Platform Exercise 2: Analytical assets in Databricks SQL Exercise 3: Querying our coffee dataset Exercise 4: Analyzing insurance claims Exercise 5: Supplementing queries with functions Exercise 6: Visualizing query results Exercise 7: Creating an analytics application Exercise 8: The role of Partner Connect Exercise 9: A coffee data dashboard Exercise 10: Adding filters to dashboards Exercise 11: Using parameters in queries Exercise 12: Create an executive dashboard

In the final chapter, you will learn some more advanced techniques that leverage the key differentiators of the Databricks platform. You will learn how to handle high-velocity and fast-changing data using window functions, and will be able to merge datasets as they come in.

Exercise 1: Common data engineering patterns Exercise 2: Append vs. CDC Exercise 3: Updating coffee sales data

Current Exercise

Exercise 4: Optimizing data with SQL Exercise 5: Using INSERT Exercise 6: Using MERGE Exercise 7: Advanced data analysis patterns Exercise 8: Understanding window functions and sub-queries Exercise 9: Analyzing coffee sales by store Exercise 10: Writing a sub-query Exercise 11: Using a window function Exercise 12: Using windows and sub-queries together Exercise 13: Course recap