Ingesting Data Into Lakehouses

1. Ingesting Data Into Lakehouses

In this chapter, we will dive deeper into Lakehouses in Fabric.

2. Lakehouses recap

Let’s quickly recap what we already know about Lakehouses. Lakehouses are one of Fabric’s storage solutions. Lakehouses can store structured data in the form of Delta Tables, a format that relies on Fabric’s OneLake. However, Lakehouses can also store unstructured data - unlike with a warehouse, you can directly upload raw files, like a .csv file, to your Lakehouse. Finally, Lakehouse users tend to interact with their data through Python notebooks or PowerBI projects. While there is still a way to write SQL queries to work with this data, there are some limitations to what you can do using SQL; more on this in the next video.

3. Dataflow and Data Pipelines

Let’s talk about how to add data to your Lakehouse. Conveniently, some of the most common ways of doing this should look familiar. Like with a Warehouse, you can use ingestion tools like Dataflow or Data Pipelines to create tables in your Lakehouse. The process is essentially identical, with the only difference being the target destination of a Lakehouse.

4. Unstructured data

But beyond ingestion tools, you can also directly upload raw files to a lakehouse. Lakehouses are split into Tables and Files; any raw files you add can be found in the Files folder. Once these files are in the lakehouse, you can directly transform them into Delta Tables.

5. Unstructured data to tables

Think of this like a more bare-bones version of Dataflow. You won’t be able to do some of the data manipulation tasks you could do with Dataflow, but the result is the same: raw data from a .csv file is transformed into a Delta Table.

6. Shortcuts

Finally, you can add data to a Lakehouse using the shortcut feature. Think of a shortcut as a pointer to another source of data. For example, if you have data in a warehouse, you could create a shortcut to that data in your lakehouse. This underscores one of Fabric's primary purposes: creating a single, unified source of data. Rather than having several distinct copies of data in multiple locations, Fabric encourages organizations to have a single data lake across their entire enterprise.

7. Shortcut connections

The really cool thing about shortcuts is that you can create shortcuts to data that are outside of Fabric as well. For example, if your organization has databases in Amazon’s S3 storage system or Google’s Google Cloud Storage, you can easily create shortcuts to those databases in Fabric. As you might expect, there are some permission and credential details that you will need to consider when making shortcuts. However, the takeaway here should be that it is easy for Lakehouses to quickly reference data from other sources.

8. Let's practice!

Let’s jump into some exercises where we will add data to our lakehouse!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.