Get startedGet started for free

Utilize DoFn lifecycle

1. Utilize DoFn lifecycle

person: DoFns play an important role in dataflow pipelines. They allow users to transform each input element. In this section, we'll explore how we can reference the lifecycle of DoFn objects for micro batching if required. It's common to invoke external APIs as part of your pipeline. While working on big data use cases, it is easy to overwhelm an external service endpoint if you make a single call for each element flowing through the system, especially if you haven't applied any reducing functions. If you remember what we covered in the Beam concepts review module, you will remember this is what the lifecycle of a DoFn looks like. We recommend batching calls to external systems by leveraging @StartBundle and @FinishBundle lifecycle elements. The code snippets here shows surer code to override @StartBundle and @FinishBundle functions of DoFn. For micro batching, you can initialize or reset the batch in @StartBundle and commit it in the @FinishBundle function. Remember, depending on runner implementation, @StartBundle and @FinishBundle may be called multiple times to process more than one bundle. It is important to reset variables appropriately while using lifecycle functions of DoFn.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.