Splittable DoFn
1. Splittable DoFn
Splittable Do Functions, or DoFns, enhance the sources and generic Do Function capabilities. Sources going forward will be able to utilize splittable Do Functions for both streaming and batch. This brings Beam's unified batch and streaming programming model closer to fruition. Splittable Do Functions are a generalization of a Do Function that gives it core capabilities of a source, splittability, and the ability to report back the metrics that you learned about earlier in this module, such as progress. Progress and other metrics then allow you to know how far along a bundle is and how far it has to go. This enables the ability for the work to be split into multiple bundles. The splittable Do Function also keeps the syntax, flexibility, modularity, and ease of coding from the Do Function syntax. When you read a file, the splittable Do Function allows you to set the restrictions, such as a sequence of blocks, on where the files are read to. Splittable Do Functions allow you to build custom sources with ease. The function is a Do Function with additional parameters, such as RestrictionTracker, as shown in this Java example. You need to define an initial restriction that will create a restriction describing a complete unit of work. This is shown with the function "def initial_restriction" in this pipeline example. One way to accelerate your development of a Dataflow pipeline is to refer to the open source code as a basis for your code. There are many examples in Python and in Java in the links provided. Thanks for listening.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.