Persisting data to files
Loading data to a final destination is one of the most important steps of a data pipeline. In this exercise, you'll use the transform()
function shown below to transform product sales data before loading it to a .csv
file. This will give downstream data consumers a better view into total sales across a range of products.
For this exercise, the sales data has been loaded and transformed, and is stored in the clean_sales_data
DataFrame. The pandas
package has been imported as pd
, and the os
library is also ready to use!
This exercise is part of the course
ETL and ELT in Python
Exercise instructions
- Update the
load()
function to write data to the provided path, without headers or an index column. - Check to make sure the file was loaded to the desired file path.
- Call the function to load the transformed data to persistent storage.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def load(clean_data, file_path):
# Write the data to a file
clean_data.to_csv(file_path, ____, ____)
# Check to make sure the file exists
file_exists = os.____.____(____)
if not file_exists:
raise Exception(f"File does NOT exists at path {file_path}")
# Load the transformed data to the provided file path
____(clean_sales_data, "transformed_sales_data.csv")