Persisting data to files
Loading data to a final destination is one of the most important steps of a data pipeline. In this exercise, you'll use the transform() function shown below to transform product sales data before loading it to a .csv file. This will give downstream data consumers a better view into total sales across a range of products.
For this exercise, the sales data has been loaded and transformed, and is stored in the clean_sales_data DataFrame. The pandas package has been imported as pd, and the os library is also ready to use!
Deze oefening maakt deel uit van de cursus
ETL and ELT in Python
Oefeninstructies
- Update the
load()function to write data to the provided path, without headers or an index column. - Check to make sure the file was loaded to the desired file path.
- Call the function to load the transformed data to persistent storage.
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
def load(clean_data, file_path):
# Write the data to a file
clean_data.to_csv(file_path, ____, ____)
# Check to make sure the file exists
file_exists = os.____.____(____)
if not file_exists:
raise Exception(f"File does NOT exists at path {file_path}")
# Load the transformed data to the provided file path
____(clean_sales_data, "transformed_sales_data.csv")