Get startedGet started for free

Data Engineering in Microsoft Fabric

1. Data Engineering in Microsoft Fabric

In this video, we’ll learn about the role that the Fabric Data Engineering experience plays in the lifecycle of data analytics solutions.

2. Data Analytics End To End

Building a data analytics solution requires the ability to ingest data from source systems, store the ingested data in a central location, prepare and transform the data so it's suitable for analysis, and query the data so it can be visualized and analyzed.

3. Data Factory

Data Factory supports ingesting data from multiple data sources into OneLake. The Data Factory experience provides two items with capabilities that support data ingestion and transformation activities. These items are named Dataflows and Data Pipelines.

4. Dataflows

Dataflows provide a low-code interface for ingesting and transforming data. They use the Power Query transformation engine familiar to users of Microsoft Excel and Power BI. Dataflows provide connectors to a large number of data sources, and support transformations such as joins and aggregations.

5. Data Pipelines

A pipeline is a group of activities that together perform a task. For example, a pipeline might ingest data from a source and then clean it to prepare the data for reporting. Data Factory has three types of activities: data movement, data transformation, and control activities. For example, the Copy Activity copies data between data stores. Other activities can run Dataflows, Notebooks, and stored procedures. Control activities include If Conditions and ForEach loops.

6. Synapse Data Engineering

The Synapse Data Engineering experience provides items that support storage and processing of large volumes of data. These items include Lakehouses for data storage, and Notebooks and Apache Spark jobs for data processing.

7. Lakehouses

Lakehouses are collections of files and folders that support storing and managing structured and unstructured data in a single location built on OneLake.

8. Notebooks

Notebooks are an interactive web interface for data processing. You can and share documents that contain code, visualizations, and commentary text. Notebooks support multiple languages, including Python, R, and Scala. Notebooks can be used for data ingestion, preparation, analysis, and other data-related tasks.

9. Apache Spark Job Definitions

A Spark job definition is a set of parameters that allows you to specify and submit batch or streaming jobs to the Spark cluster. Notebooks are good for data exploration and prototyping, while Apache Job Definitions are more suitable for scheduling production-ready data processing tasks.

10. Synapse Data Warehouse

Synapse Data Warehouse supports the creation of SQL-compatible data warehouses. Data is stored in OneLake using the open Delta Lake format, enabling use with other Fabric workloads without having to create multiple copies of data.

11. Choosing a Data Store

Data can be stored in a Lakehouse, a Warehouse, or a combination of both. A Lakehouse is best when the data includes unstructured data files and developers are strong in Spark. A Warehouse is best when the data is structured tables and the developer expertise is SQL.

12. Choosing a Data Copy Tool

Data can be ingested into OneLake using Data Factory pipelines, Dataflows, or as code in Spark Notebooks. Pipelines offer a low-code solution aimed at SQL developers. It provides a basic set of connectors and simple transformation functions.

13. Choosing a Data Copy Tool

Dataflows also require little code. They are aimed at Power Query M developers and provides a large set of connectors and more complex transformation functions.

14. Choosing a Data Copy Tool

Spark notebooks require coding, aimed at experienced Spark developers familiar with Spark libraries.

15. Let's practice!

Now, let's do a couple of exercises to practice using these Fabric experiences.