Integration Testing

1. Integration Testing

person: Now that we've gotten through the overview of unit testing, let's take a look at integration testing in some detail. An actual pipeline reads from two data sources and writes to BigQuery. Integration tests create a smaller amount of data, and assert that the output of the transforms are what we expect. For large integration tests, we work with data on closer to a production scale. To do this, we can clone data from a production project to a non production project. This diagram looks at a batch pipeline reading from two sources on cloud storage and writing to BigQuery. We can use the storage transfer service to copy cloud storage data. We can copy a BigQuery data set or even work with the production data set is read only. Let's take a look at large integration testing for streaming pipelines. One of the nice things about streaming data sources like cloud pub sub is that you can easily attach extra subscriptions to a topic. This comes at an extra cost. But for any major updates, you should consider cloning the production environment and running through the various lifecycle events. To clone the pub sub stream, you can simply create a new subscription against the production topic. You may also consider doing this activity on a regular cadence, such as after you have had a certain number of minor updates to your pipelines. The other option this brings is the ability to carry out AB testing. This can be dependent on the pipeline and on the update. But if the data you're streaming can be split, for example, on entry to the topic, and the syncs can tolerate different versions of the transforms, then this gives you a great way to ensure everything goes smoothly in production. And integration tests, we typically test the entire pipeline without sources and sinks. In this example, we see a p transform subclass called weather stats pipeline that summarizes integers representing weather data. We create a test pipeline instance and test weather stats pipeline by creating a p collection of integers and asserting that the result of the pipeline transformations match the data we expect.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Serverless Data Processing with Dataflow: Operations

AdvancedSkill Level

4.9+

7 reviews

In this module, we learn how to use the Jobs List page to filter for jobs that we want to monitor or investigate. We look at how the Job Graph, Job Info, and Job Metrics tabs collectively provide a comprehensive summary of your Dataflow job. Lastly, we learn how we can use Dataflow’s integration with Metrics Explorer to create alerting policies for Dataflow metrics.

Exercise 1: Job List Exercise 2: Job Info Exercise 3: Job Graph Exercise 4: Job Metrics Exercise 5: Metrics Explorer Exercise 6: Quiz Question 1 Exercise 7: Quiz Question 2 Exercise 8: Additional Resources

This module reviews the topics covered in the course

Exercise 1: Course Summary