Deploy an ETL pipeline on Kubernetes

You will deploy an ETL pipeline on Kubernetes. Your Extract, Transform, and Load steps will be realized by Pods, which read and write to respective Persistent Volumes that Persistent Volume Claims create.

Your task is to find the total number of passengers that took a NYC yellow cab as a group of 2 or more. Your "Extract Pod" will prepare initial data as a CSV file, and hand it over to the "Transform Pod". This Pod will refine the yellow cab data into an SQLite database, select all the data that is necessary for the final computation, and hand it over to the "Load Pod". This final Pod will sum all the passenger data, present it, and save it as a CSV file.

All of these steps will be performed using the standard Kubernetes objects that you know. There have been two directories prepared, "Docker/" and "Manifests/", which hold the necessary files to create the Docker images and deploy them using Kubernetes.

Inspect the files in the Docker/ directory, in particular Docker/Dockerfile.* and Docker/*.sql. You can use a pager like more, or the command cat to view the content of individual files.
Execute the build script 01_build_and_upload_images.sh using the command bash. This will build three Docker images (extract:v1, transform:v1, load:v1), and upload them into your Kubernetes cluster.

Note: There have been two directories prepared, "Docker/" and "Manifests/", which hold the necessary files to create the Docker images and deploy them using Kubernetes. You can exit a cat command using 'CTRL+D'.

Exercise

Deploy an ETL pipeline on Kubernetes

Instructions 1/4

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions 1/4

Exercise