Get startedGet started for free

ETL and ELT

1. ETL and ELT

Hey, and welcome back! In this video, we will revisit the data integration process of ETL and compare it to a similar technique called ELT. We can use both methods for integrating data in a data warehouse.

2. Understanding ETL and ELT names

ETL and ELT begin with extracting the data, and both load and transform data, but the orders are different. In an ETL process, the data is transformed and cleaned, then loaded into the data warehouse. In comparison, ELT stands for extract, load, and transform. In this process, the data is extracted, loaded into the data warehouse, and then transformed. The overall goal for both methods is to end up with cleansed data that we can use in the data warehouse, but the order of the steps they take is different. The key to the ELT process is that the data is loaded before being transformed. We will review this more shortly but first let's look at the ETL process again.

3. Understanding ETL - Pros and Cons

With the ETL process, the data is transformed as it moves from the inputs to the warehouse. This process requires organizations to use a cloud-based service or purchase the software and computer servers to run it themselves. Overall the ETL process requires a separate system from the data warehouse to perform the transformations as the data moves. A couple of pros of using an ETL process come about because the process only delivers cleaned and transformed data to the warehouse. First, ETL tends to have lower data storage costs than ELT because it only keeps copies of the transformed data. Secondly, meeting compliance security standards on PII or personally identifiable information is more manageable. We can omit PII data in the source systems before loading it into the warehouse, so sensitive data will never make it into our data warehouse, and many ETL tools are certified to meet government PII security standards. However, there are a couple of cons to using an ETL process. First, we do not store the original data from the input sources during the transformation process. Therefore, any errors or changes to the transformation logic mean pulling the data from the source systems again. It can be challenging to find the time to pull large batches of data from the systems without slowing them down. Additionally, there are costs related to the ETL's separate computer system.

4. Understanding ELT - Pros and Cons

However, the ELT process extracts data from the sources and loads full copies of that data into the data warehouse. Then the resources and software of the data warehouse are used to transform the data. Afterward, the transformed data is then used as usual for the data warehouse. Some pros of the ELT process are, no need for a separate computer system for the transformation process. Additionally, we can rerun the transformation process without impacting the source systems. Finally, it is often used for near real-time situations because the amount of time needed to load the data is not connected to the complexity of the transformations in the ELT process. However, cons include the need for more storage to hold the copies of the original data. Also, additional considerations are needed to comply with the security standards if the data sources contain PII data.

5. The cloud and ELT

The ELT processes have grown in popularity due in part to the growth of cloud data warehouses. We will speak more about this in an upcoming video. Still, for now, the cloud allows an organization to store a near-unlimited amount of data with access to vast computing resources. This means an organization is not limited by storage and can process the transformations faster using parallel computing power.

6. Let's practice!

Okay, time for some practice!