The Seven Fabric Workloads (Part 1)

1. The Seven Fabric Workloads (Part 1)

Let’s explore the three Data Engineering workloads in Fabric: Data Factory, Data Engineering, and Data Warehouse.

2. Fabric Workloads

Fabric organizes tools into distinct workloads based on their functions, making it easy to find the right tool for your task. Data Factory, for instance, focuses on ingesting and transforming data. If your task involves data ingestion, you can easily find all relevant tools in Data Factory without sifting through unrelated ones.

3. Data Factory

Let’s begin by exploring the tools within Data Factory. The primary tools here are Dataflow Gen 2 and Data Pipeline.

4. Data Factory

Dataflow Gen 2 ingests data from various sources into the Fabric ecosystem. You can select specific data from sources like a Snowflake database and load it into Fabric’s OneLake through a storage solution such as a lakehouse or warehouse. Dataflow’s interface is user-friendly, resembling Power Query in Excel or Power BI, making it accessible to those unfamiliar with programming languages. It is designed for business and data analysts.

5. Data Factory

Data Pipelines are designed to support complex, rule-based data workflows. For example, you may want to create a pipeline that ingests sales data on a weekly basis, but only if all files are present. Data pipelines are aimed at data engineers and other technical personas.

6. Data Warehouse

Next, we have the Data Warehouse workflow, which contains the Synapse Data Warehouse. A data warehouse is one of Fabric’s main storage options, the other being a lakehouse. Warehouses store data in structured databases that are queryable by SQL.

7. Data Warehouse

As we saw earlier, you can use tools like Data Pipeline or Dataflow Gen 2 to load data into a warehouse.

8. Data Warehouse

Warehouses support full transactional operations with SQL, meaning you can execute both read and write queries using SQL.

9. Data Warehouse

Remember, data in a warehouse is stored as Parquet files within the OneLake, allowing all Fabric tools to interact with the same data. This means data engineers using Python and analysts using SQL can work seamlessly with the same data set.

10. Data Engineering

Finally, let’s discuss the Data Engineering workflow, focusing on the Lakehouse. Like a warehouse, a lakehouse is a storage option within Fabric but unlike a warehouse, lakehouses support both structured and unstructured data.

11. Data Engineering

Developers typically interact with lakehouses using notebooks and Python code, using packages like PySpark to transform and analyze data.

12. Data Engineering

While SQL can be used to read data in a lakehouse, write operations are not supported, requiring Spark for data modification. Developers familiar with Python and Spark tend to use Lakehouses, while heavy SQL users might prefer to use a warehouse.

13. Data Engineering

Similar to warehouses, data in lakehouses is stored in the OneLake as Parquet files, making it accessible to other users in the Fabric workspace.

14. Let's practice!

We’ve learned a lot about the Data Factory, Data Warehouse, and Data Engineering workflows. Let’s now dive into Fabric to get some hands-on experience.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.