Introduction to ETL and ELT Pipelines

1. Introduction to ETL and ELT Pipelines

Welcome in! I'm Jake, and I'll be your instructor as we begin our journey together into the world of ETL and ELT pipelines. As a Data Engineer, I leverage tools like Python, SQL, and Airflow to build all sorts of data pipelines. Now, I'm excited to help you do the same. Let's dive right into it!

2. BI ML AI

Chances are, you've heard buzzwords like business intelligence, machine learning, and AI thrown around when talking about data. For many companies, initiatives like AI are at the forefront of their effort to drive innovation and growth. But to get started with any of these, we need data to be in the right place, in the right shape, at the right time. How do we do that? With data pipelines!

3. Data pipelines

For all intents and purposes, data pipelines are responsible for moving data from a source to a destination, and transforming it somewhere along the way. These data sources, or source systems, can be a variety of things such as CSV files, APIs, or databases. Once data is pulled from a source system, the goal is to transform this data and load it into a destination where it can be used for things like business intelligence, machine learning, or AI. Together, we're going to explore two flavors of data pipelines; ETL and ELT.

4. ETL and ELT Pipelines

ETL is short for extract, transform, and load. ETL pipelines first extract data, before transforming it and loading it to a destination. ETL is the most traditional data pipeline design pattern, and may pull from tabular or non-tabular data sources. Typically, ETL pipelines use tools like Python and libraries such as pandas to manipulate and transform data. We'll take a closer look at an ETL pipeline in just a bit. Different than ETL, ELT pipelines extract and load data, before transforming it. ELT is a pattern that has more recently gained traction with the advent of data warehouses. Typically, ELT pipelines are operating on tabular source data, before loading it to a data warehouse for transformation.

5. Extract, transform, load (ETL)

As promised, here's a look at an ETL pipeline, developed using Python. Here, we're using three custom-built Python functions to extract, transform, and load data. Shown here is the load function, which writes a pandas DataFrame to a SQL database. Then, the extract, transform, and load functions are called to execute the ETL pipeline. Check out the output below!

6. Extract, load, transform (ELT)

Similar to our ETL pipeline from before, we'll create extract, load, and transform functions to build an ELT pipeline. Here is a sample transform function, which creates a table using the results of a SELECT statement. Like before, we call the extract, load, and transform functions to run the pipeline.

7. We'll also take a look at...

Together, we'll learn how to work with both tabular, and non-tabular data. We'll lean on pandas for both data transformation, as well as writing data to disk and SQL databases. Make sure to stick around for the end of the course, where we'll cover things like unit-testing, monitoring, and deploying data pipelines to production.

8. Let's practice!

Pretty cool, huh? Now, it's time to build your first ETL and ELT pipelines with a few hands-on exercises. Good luck!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.