Get startedGet started for free

Dataflow

1. Dataflow

Federico: Now, let's discuss how to separate compute and storage with Dataflow. This module contains four sections. In this video, we talk about Dataflow. Dataflow allows you to execute your VM pipelines on Google Cloud. There are several reasons why customers love Dataflow so much. First, it's because it is fully managed and autoconfigured. Second, Dataflow optimizes the graph execution by fusing operations efficiently and by not waiting for previous steps to finish before starting a new one unless there is a dependency involved. Third, autoscaling happens step by step in the middle of a pipeline job. As a job needs more resources, it receives them automatically. You don't have to manually scale resources to match job needs, and you don't pay for VM resources that aren't being used. Dataflow will turn down the workers as the job demand decreases. All of this happens while maintaining strong streaming semantics Aggregations like sums and counts are correct, even if the input source sends duplicate records. As we mentioned in a previous course, Dataflow can also handle later-arriving records with intelligent watermarking. Now, let's talk about how to separate compute and storage and save money and time with Dataflow shuffle service, Dataflow streaming engine, and flexible resource scheduling.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.