Persisting data to files

Loading data to a final destination is one of the most important steps of a data pipeline. In this exercise, you'll use the transform() function shown below to transform product sales data before loading it to a .csv file. This will give downstream data consumers a better view into total sales across a range of products.

For this exercise, the sales data has been loaded and transformed, and is stored in the clean_sales_data DataFrame. The pandas package has been imported as pd, and the os library is also ready to use!

Deze oefening maakt deel uit van de cursus

ETL and ELT in Python

Cursus bekijken

Oefeninstructies

Update the load() function to write data to the provided path, without headers or an index column.
Check to make sure the file was loaded to the desired file path.
Call the function to load the transformed data to persistent storage.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

def load(clean_data, file_path):
    # Write the data to a file
    clean_data.to_csv(file_path, ____, ____)

    # Check to make sure the file exists
    file_exists = os.____.____(____)
    if not file_exists:
        raise Exception(f"File does NOT exists at path {file_path}")

# Load the transformed data to the provided file path
____(clean_sales_data, "transformed_sales_data.csv")

Code bewerken en uitvoeren