Persisting data to files
Loading data to a final destination is one of the most important steps of a data pipeline. In this exercise, you'll use the transform() function shown below to transform product sales data before loading it to a .csv file. This will give downstream data consumers a better view into total sales across a range of products.
For this exercise, the sales data has been loaded and transformed, and is stored in the clean_sales_data DataFrame. The pandas package has been imported as pd, and the os library is also ready to use!
This exercise is part of the course
ETL and ELT in Python
Exercise instructions
- Update the
load()function to write data to the provided path, without headers or an index column. - Check to make sure the file was loaded to the desired file path.
- Call the function to load the transformed data to persistent storage.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def load(clean_data, file_path):
# Write the data to a file
clean_data.to_csv(file_path, ____, ____)
# Check to make sure the file exists
file_exists = os.____.____(____)
if not file_exists:
raise Exception(f"File does NOT exists at path {file_path}")
# Load the transformed data to the provided file path
____(clean_sales_data, "transformed_sales_data.csv")