Get startedGet started for free

Geolocation

1. Geolocation

person: In this section we'll talk about how selecting the location of your data for processing can impact the reliability of your pipeline. Dataflow is a regional service. When a user submits a job to a regional endpoint without explicitly specifying end zone, the Dataflow service routes the job to a zone in the specified region based on resource availability. In other words, Dataflow will pick the best zone for the job based on available capacity. If you explicitly specify a zone, you will not get this benefit. If a job submission fails due to a zone issue, retrying without explicitly specifying a zone will usually fix the issue. This is a helpful technique in the event of a zonal outage. Note that you cannot change the location of a job after you got started. If it is a streaming job, you will have to drain or cancel the pipeline first before launching it again. This applies to when you relaunch a job in the same region without the zone specified or if you choose to relaunch a job in an entirely different region. When thinking about the locations of your Dataflow job, there are three elements to be aware: Your sources, your processing, and your sinks. You should always locate your resources in the same region. For an additional layer of reliability, you can also elect to use multiregional sources and sinks. Services like Google Cloud Storage, BigQuery, and Pub/Sub provide geo-redundant options that make your data seamlessly accessible in multiple regions. Dataflow processing can only occur in one region. But in the event of a regional outage, using multiregional sources and sinks allows you to move your data processing to a different region without suffering from performance penalty. You should try to avoid any configurations that have critical cross-region dependency. If you have a pipeline that has a critical dependency on services from multiple regions, your pipeline is likely to be affected by a failure in any of those regions. For example, a pipeline that is reading from my Cloud Storage bucket in us-central1 and writing for BigQuery table in us-east4 could go down if either one of those two regions are down.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.