Get startedGet started for free

Container Environments

1. Container Environments

Federico: In this video, we'll look at container environments. The Beam SDK runtime environment can be containerized with Docker to isolate it from other runtime systems. Each user operation has an associated environment in which to execute. Typically, supported SDKs provide a default environment that you can further customize. Because of containerization, you can benefit from ahead-of-time installation. You can include arbitrary dependencies, and even further customization is possible. Now, let's see how you can run your pipeline with custom containers. To use this feature, you need to have the Apache Beam SDK version 2.25.0 or later installed. If you want to test your pipeline locally, you will also need to have Docker installed. To create a custom container image, create a Docker file in which you specify the Apache Beam image as the parent image. Then add your own customizations. After creating your custom Docker file, you need to build the image and push it to a container registry. To do so, you need to specify your project, the name of the image repository, the tag that you want to associate with your image, and the image registry host name. Then you can use either Cloud Build or Docker to build the image and push it to a container registry like GCR.IO. Finally, you can launch your Dataflow job by referencing the regular parameters and the location of the custom container image.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.