Get startedGet started for free

Hydrating the lakehouse

1. Hydrating the lakehouse

In this scenario, I’ll continue in my role as a data analyst for a large coffee retail company. I’ve been handed several data files to analyze and want to ingest them into Databricks to fully leverage the platform’s powerful tools. I start by heading to the Data Ingestion section of Databricks, where I can easily upload one of my CSV files to get a quick look at the data. Using the GUI, I create a table from the *domestic_consumption* file, which contains columns showing total coffee consumption by origin and year. This is a straightforward and user-friendly way to get started, and I can already see that this data will be useful for future analysis. To speed things up, I decide to take a more programmatic approach for the rest of the files. I switch to the SQL Editor pane and write a script using the `COPY INTO` command. This will let me efficiently create tables from the files I’ve already uploaded to a Databricks Volume. Finding the file paths is easy just open the Catalog Explorer and copy them directly from the catalog pane on the left-hand side. With this script, I can create a table for each file in just a few minutes. Once the script runs, each table is populated with data from its respective file. I jump back to the Catalog Explorer to check out my newly created tables. It’s great to see everything organized. I can view an overview of each table’s columns and even preview some sample data directly in the interface. Now that I’ve ingested enough data, I’m ready to start building out a more comprehensive data model and expanding on the initial analysis I’ve done. This is where things get exciting! In the upcoming exercises, you’ll get hands-on experience ingesting data using a variety of techniques, laying the foundation for your own data model and analyses. Let’s dive in and explore all the possibilities together!

2. Let's practice!