Logging within a data pipeline
In this exercise, we'll take a look back at the function you wrote in a previous video and practice adding logging to the function. This will help when troubleshooting errors or making changes to the logic!
pandas
has been imported as pd
. In addition to this, the logging
module has been imported, and the default log-level has been set to "debug"
.
This is a part of the course
“ETL and ELT in Python”
Exercise instructions
- Create an info-level log after the transformation, passing the string:
"Transformed 'Order Date' column to type 'datetime'."
- Log the
.shape
of the DataFrame at the debug-level before and after filtering.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def transform(raw_data):
raw_data["Order Date"] = pd.to_datetime(raw_data["Order Date"], format="%m/%d/%y %H:%M")
clean_data = raw_data.loc[raw_data["Price Each"] < 10, :]
# Create an info log regarding transformation
logging.____("Transformed 'Order Date' column to type 'datetime'.")
# Create debug-level logs for the DataFrame before and after filtering
____(f"Shape of the DataFrame before filtering: {raw_data.shape}")
____(f"Shape of the DataFrame after filtering: {clean_data.shape}")
return clean_data
clean_sales_data = transform(raw_sales_data)