The Seven Fabric Workloads (Part 1)
1. The Seven Fabric Workloads (Part 1)
Let’s explore the three Data Engineering workloads in Fabric: Data Factory, Data Engineering, and Data Warehouse.2. Fabric Workloads
Fabric organizes tools into distinct workloads based on their functions, making it easy to find the right tool for your task. Data Factory, for instance, focuses on ingesting and transforming data. If your task involves data ingestion, you can easily find all relevant tools in Data Factory without sifting through unrelated ones.3. Data Factory
Let’s begin by exploring the tools within Data Factory. The primary tools here are Dataflow Gen 2 and Data Pipeline.4. Data Factory
Dataflow Gen 2 ingests data from various sources into the Fabric ecosystem. You can select specific data from sources like a Snowflake database and load it into Fabric’s OneLake through a storage solution such as a lakehouse or warehouse. Dataflow’s interface is user-friendly, resembling Power Query in Excel or Power BI, making it accessible to those unfamiliar with programming languages. It is designed for business and data analysts.5. Data Factory
Data Pipelines are designed to support complex, rule-based data workflows. For example, you may want to create a pipeline that ingests sales data on a weekly basis, but only if all files are present. Data pipelines are aimed at data engineers and other technical personas.6. Data Warehouse
Next, we have the Data Warehouse workflow, which contains the Synapse Data Warehouse. A data warehouse is one of Fabric’s main storage options, the other being a lakehouse. Warehouses store data in structured databases that are queryable by SQL.7. Data Warehouse
As we saw earlier, you can use tools like Data Pipeline or Dataflow Gen 2 to load data into a warehouse.8. Data Warehouse
Warehouses support full transactional operations with SQL, meaning you can execute both read and write queries using SQL.9. Data Warehouse
Remember, data in a warehouse is stored as Parquet files within the OneLake, allowing all Fabric tools to interact with the same data. This means data engineers using Python and analysts using SQL can work seamlessly with the same data set.10. Data Engineering
Finally, let’s discuss the Data Engineering workflow, focusing on the Lakehouse. Like a warehouse, a lakehouse is a storage option within Fabric but unlike a warehouse, lakehouses support both structured and unstructured data.11. Data Engineering
Developers typically interact with lakehouses using notebooks and Python code, using packages like PySpark to transform and analyze data.12. Data Engineering
While SQL can be used to read data in a lakehouse, write operations are not supported, requiring Spark for data modification. Developers familiar with Python and Spark tend to use Lakehouses, while heavy SQL users might prefer to use a warehouse.13. Data Engineering
Similar to warehouses, data in lakehouses is stored in the OneLake as Parquet files, making it accessible to other users in the Fabric workspace.14. Let's practice!
We’ve learned a lot about the Data Factory, Data Warehouse, and Data Engineering workflows. Let’s now dive into Fabric to get some hands-on experience.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.