Handling un-processable data

1. Handling un-processable data

person: Next section is handling un-processable or erroneous records in a pipeline. When working in a real world use case, we should design our pipelines to handle data that is not in the desired or expected format. Erroneous records may cause our pipeline to fail if not handled properly. When faced with erroneous records, rather than just log the issue, send the [inaudible] to a persistent storage medium, such as BigQuery or Cloud storage to handle them separately. Take--use double tags to access multiple outputs from the resulting P collection. Here we can see the [inaudible] code to implement the dead-letter sink pattern. Consider wrapping user code inside a process element function with a try-catch block. Inside the try-catch block, avoid logging every error exception, as it may overwhelm the whole pipeline. Especially when presenting your [inaudible], increases, instead, send the erroneous records to an alternative dead-letter sink. Line 11 in this snippet shows the erroneous records is being sent to a side output using dead-letter tag. Finally, it is written to a different sink at line 15.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.