Summary
1. Summary
person: The previous videos, you have learned about how you can use state and timers to implement stateful transformations. Remember, there are two types of timers available in a [inaudible]. Processing time timers are good for implementing timeouts. And event time timers are good for output based on data completeness. The time in processing time is relative to the previous messages and you will have periodic outputs based on that relative time. Event time timers are based on the timestamps of the messages being processed. If you want to make sure that you are emitting output when data is complete and you don't expect more data, then use event timers. A word of warning with event timers. Always make sure to clear the state after emitting the output. If you leave the state behind, then the function will keep waiting for new data and that will consume resources in your pipeline. In summary, for short and predictable latencies with maybe incomplete results, use processing time. For complete outputs with possible high latency, use event timers. Depending on the kind of state that you want to accumulate, you can use a different type of state variables. Value state is genetic. It can hold any kind of value of any type. If you want to add several elements, use a bag state for a more efficient pipeline. Bag will return the objects that were added previously, but with no guarantee of order. Appending objects to a bag is very fast. For any kind of aggregation that is associative and commutative, it is better to use the combining state. And if you are going to maintain a set of key values, a dictionary or map, use map state. With a map, you have random access given a key. Map state is more efficient than other state variables for retrieving specific keys. Finally, the set state, available in the patching programming model, but not supported in data flow. You may use the bag state for similar purposes instead. In summary, state and timers open a lot of new possibilities for due functions. You could implement domain-specific triggering of result not only based on time, but based on anything you may think of. There are also applications for slowly changing dimensions when you keep a dimension table and only a reference to every dimension in a large collection of data. How do you update your large collection of data when the dimension changes? Yes, with state and timers. Joins in a streaming or joining all the elements of a graph with all elements of another graph, so-called biclique. You can also apply the state and timers to implement such a join logic. In any situation where you need fine control on how the aggregation of the elements is done, state and timers allow you to implement a precise and complex logic. In general, any workflow that should be applied per key can be expressed as state and timers. State and timers are a very powerful feature of [inaudible]. You can implement complex logic in a due function and do much more than just map and filters. We could almost say that the limit of the difference is you can do with state and timers is imagination.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.