Logging within a data pipeline
In this exercise, we'll take a look back at the function you wrote in a previous video and practice adding logging to the function. This will help when troubleshooting errors or making changes to the logic!
pandas
has been imported as pd
. In addition to this, the logging
module has been imported, and the default log-level has been set to "debug"
.
This is a part of the course
“ETL and ELT in Python”
Exercise instructions
- Create an info-level log after the transformation, passing the string:
"Transformed 'Order Date' column to type 'datetime'."
- Log the
.shape
of the DataFrame at the debug-level before and after filtering.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def transform(raw_data):
raw_data["Order Date"] = pd.to_datetime(raw_data["Order Date"], format="%m/%d/%y %H:%M")
clean_data = raw_data.loc[raw_data["Price Each"] < 10, :]
# Create an info log regarding transformation
logging.____("Transformed 'Order Date' column to type 'datetime'.")
# Create debug-level logs for the DataFrame before and after filtering
____(f"Shape of the DataFrame before filtering: {raw_data.shape}")
____(f"Shape of the DataFrame after filtering: {clean_data.shape}")
return clean_data
clean_sales_data = transform(raw_sales_data)
This exercise is part of the course
ETL and ELT in Python
Learn to build effective, performant, and reliable data pipelines using Extract, Transform, and Load principles.
What is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.