Get Started

Logging within a data pipeline

In this exercise, we'll take a look back at the function you wrote in a previous video and practice adding logging to the function. This will help when troubleshooting errors or making changes to the logic!

pandas has been imported as pd. In addition to this, the logging module has been imported, and the default log-level has been set to "debug".

This is a part of the course

“ETL and ELT in Python”

View Course

Exercise instructions

  • Create an info-level log after the transformation, passing the string: "Transformed 'Order Date' column to type 'datetime'."
  • Log the .shape of the DataFrame at the debug-level before and after filtering.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

def transform(raw_data):
    raw_data["Order Date"] = pd.to_datetime(raw_data["Order Date"], format="%m/%d/%y %H:%M")
    clean_data = raw_data.loc[raw_data["Price Each"] < 10, :]
    
    # Create an info log regarding transformation
    logging.____("Transformed 'Order Date' column to type 'datetime'.")
    
    # Create debug-level logs for the DataFrame before and after filtering
    ____(f"Shape of the DataFrame before filtering: {raw_data.shape}")
    ____(f"Shape of the DataFrame after filtering: {clean_data.shape}")
    
    return clean_data
  
clean_sales_data = transform(raw_sales_data)

This exercise is part of the course

ETL and ELT in Python

IntermediateSkill Level
4.7+
27 reviews

Learn to build effective, performant, and reliable data pipelines using Extract, Transform, and Load principles.

Dive into leveraging pandas to extract, transform, and load data as you build your first data pipelines. Learn how to make your ETL logic reusable, and apply logging and exception handling to your pipelines.

Exercise 1: Extracting data from structure sourcesExercise 2: Extracting data from parquet filesExercise 3: Pulling data from SQL databasesExercise 4: Building functions to extract dataExercise 5: Transforming data with pandasExercise 6: Filtering pandas DataFramesExercise 7: Transforming sales data with pandasExercise 8: Validating data transformationsExercise 9: Persisting data with pandasExercise 10: Loading sales data to a CSV fileExercise 11: Customizing a CSV fileExercise 12: Persisting data to filesExercise 13: Monitoring a data pipelineExercise 14: Logging within a data pipeline
Exercise 15: Handling exceptions when loading dataExercise 16: Monitoring and alerting within a data pipeline

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free