Hydrating the lakehouse

1. Hydrating the lakehouse

In this scenario, I’ll continue in my role as a data analyst for a large coffee retail company. I’ve been handed several data files to analyze and want to ingest them into Databricks to fully leverage the platform’s powerful tools. I start by heading to the Data Ingestion section of Databricks, where I can easily upload one of my CSV files to get a quick look at the data. Using the GUI, I create a table from the *domestic_consumption* file, which contains columns showing total coffee consumption by origin and year. This is a straightforward and user-friendly way to get started, and I can already see that this data will be useful for future analysis. To speed things up, I decide to take a more programmatic approach for the rest of the files. I switch to the SQL Editor pane and write a script using the `COPY INTO` command. This will let me efficiently create tables from the files I’ve already uploaded to a Databricks Volume. Finding the file paths is easy just open the Catalog Explorer and copy them directly from the catalog pane on the left-hand side. With this script, I can create a table for each file in just a few minutes. Once the script runs, each table is populated with data from its respective file. I jump back to the Catalog Explorer to check out my newly created tables. It’s great to see everything organized. I can view an overview of each table’s columns and even preview some sample data directly in the interface. Now that I’ve ingested enough data, I’m ready to start building out a more comprehensive data model and expanding on the initial analysis I’ve done. This is where things get exciting! In the upcoming exercises, you’ll get hands-on experience ingesting data using a variety of techniques, laying the foundation for your own data model and analyses. Let’s dive in and explore all the possibilities together!

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

This exercise is part of the course

Introduction to Databricks SQL

IntermediateSkill Level

4.7+

Start Course for Free

Build a strong data warehousing foundation with the Databricks Data Intelligence Platform! You will learn how Databricks and the lakehouse architecture set your organization up for a modern SQL and BI stack. You will also explore the key components of the Databricks SQL product, ranging from data storage techniques to compute optimizations.

Exercise 1: SQL in the Data Intelligence Platform Exercise 2: Benefits of the lakehouse for SQL Exercise 3: Exploring Databricks SQL Exercise 4: Exploring some data Exercise 5: Partner Connect Exercise 6: Databricks SQL key assets Exercise 7: Classifying Databricks SQL assets Exercise 8: Create a query Exercise 9: Build a visualization

Build a strong data foundation for the lakehouse architecture! You will learn how to ingest, transform, and model your data using the capabilities in Databricks SQL. You will explore GUI-based and programmatic approaches to building out the medallion architecture, and creating a data ecosystem ready for any analytics workload.

Exercise 1: Ingesting Data Exercise 2: GUI-based data ingest Exercise 3: Hydrating the lakehouse

Current Exercise

Exercise 4: Upload data manually Exercise 5: Using COPY INTO Exercise 6: Leveraging Auto Loader Exercise 7: Transforming data Exercise 8: SQL and the medallion architecture Exercise 9: Creating a coffee data layer Exercise 10: Cleaning up raw tables Exercise 11: Creating the silver layer Exercise 12: Creating a table of large premium claims

Create powerful data insights using Databricks SQL! You will learn how to write queries, create visualizations, and power dashboards using in-platform Databricks capabilities. You will practice leveraging SQL code, filters, and parameters to create a robust analytical application for your end users.

Exercise 1: Querying in the Data Intelligence Platform Exercise 2: Analytical assets in Databricks SQL Exercise 3: Querying our coffee dataset Exercise 4: Analyzing insurance claims Exercise 5: Supplementing queries with functions Exercise 6: Visualizing query results Exercise 7: Creating an analytics application Exercise 8: The role of Partner Connect Exercise 9: A coffee data dashboard Exercise 10: Adding filters to dashboards Exercise 11: Using parameters in queries Exercise 12: Create an executive dashboard

In the final chapter, you will learn some more advanced techniques that leverage the key differentiators of the Databricks platform. You will learn how to handle high-velocity and fast-changing data using window functions, and will be able to merge datasets as they come in.

Exercise 1: Common data engineering patterns Exercise 2: Append vs. CDC Exercise 3: Updating coffee sales data Exercise 4: Optimizing data with SQL Exercise 5: Using INSERT Exercise 6: Using MERGE Exercise 7: Advanced data analysis patterns Exercise 8: Understanding window functions and sub-queries Exercise 9: Analyzing coffee sales by store Exercise 10: Writing a sub-query Exercise 11: Using a window function Exercise 12: Using windows and sub-queries together Exercise 13: Course recap