Data Ingestion and Scheduling in Dataflows Gen 2

1. Data Ingestion and Scheduling in Dataflows Gen 2

In this video, we'll dive into connecting data destinations, choosing table options, managing settings, and optimizing dataflow refreshes!

2. Data Destination Connections and Table Options

Let’s start with how to configure connections to a data destination in Dataflows Gen2. Take a look at the image on the right to get an idea of how these connections are set up. You’ll configure the connection by providing details such as location, Authentication type, and Privacy settings. You can either create a new connection or use an existing one. Once your connection is configured, you’ll have two options: creating a new table or using an existing one. Keep in mind, if a new table is deleted, it will be automatically recreated during the next dataflow refresh, but if you choose an existing table and it’s deleted, Dataflow Gen2 will not recreate it. We’ll implement this in the coming exercises for better understanding.

3. Managed settings

Now let’s discuss Managed Settings. These are automatically enabled when loading data into a new table in Dataflow Gen2, providing key features. First, the update method fully replaces the data with each refresh, ensuring the table always reflects the latest data. Second, managed mapping automatically handles schema changes, like adding or modifying columns, without needing manual adjustments. Lastly, the table is dropped and recreated during each refresh to apply schema updates. However, be aware this process may remove any previously added relationships or measures.

4. Manual Settings

Now, let’s dive into Manual Settings! Disabling automatic settings gives you full control over how data is loaded. You can map columns, exclude unnecessary ones, and even choose the update methods and schema options. When it comes to Update Methods, you have two options. First is Replace that will completely overwrite the existing data. The other option is append that adds new data without deleting what’s already there. Next, we have Schema Options. These come into play only with replace. Dynamic schema allows changes but drops the table during refresh. Whereas Fixed schema keeps relationships and measures intact, but it fails if the schema doesn’t match. We’ll put this into practice in the upcoming exercises!

5. Managing Dataflow Refreshes

A refresh in dataflows is the process of applying the defined transformation steps to ensure your data is up-to-date in the destination. It’s essential for keeping your data current for downstream consumption. There are two ways to trigger a dataflow refresh: on-demand or by scheduling it. On-demand refreshes can be started manually, triggered automatically after publishing a dataflow, or initiated through a pipeline activity. Alternatively, scheduled refreshes allow you to automatically refresh the dataflow at predefined intervals, up to 48 times a day, without manual intervention. You can also cancel a refresh in progress. Its useful when resources are constrained or refresh times are longer than expected.

6. Let's practice!

Great job on understanding the concepts, now let's see them in action!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.