Get startedGet started for free

PubsubIO

1. PubsubIO

Pub/Sub is Google’s highly scalable and robust messaging service. Dataflow and Pub/Sub often go hand in hand, and you can connect them using PubsubIO. In this Java example, you are reading from a Pub/Sub topic using PubsubIO. This read method automatically creates a subscription when the Dataflow job is deployed, and is destroyed upon termination of the job. If you would like to have a subscription remain upon termination of the job, create a subscription and use the fromSubscription method. Dataflow PubsubIO automatically acknowledges the messages when the data is durably persisted in Dataflow, in other words when it is materialized in the service. By default, Pub/Sub’s message timestamp is used for windowing features. It is possible to reassign that value to take advantage of other timestamps. One example would be if you had a reported timestamp from the source, versus the published timestamp in the message. You could use the reported timestamp to compute your windows and other calculations. A common scenario in any data pipeline is the ability to capture failures. You want to be able to capture these failures so that you can act upon them. One method is to use a dead letter queue. A dead letter queue is a queue where you divert messages that meet one or more criteria. This could be a failure or a piece of data that you may need a second look at. Here’s an example of reading from a Kafka system and publishing it to Pub/Sub. There are two scenarios and you can see the failed messages are diverted to a second topic.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.