IniziaInizia gratis

Filtering pandas DataFrames

Once data has been extracted from a source system, it's time to transform it! Often, source data may have more information than what is needed for downstream use cases. If this is the case, dimensionality should be reduced during the "transform" phase of the data pipeline.

pandas has been imported as pd, and the extract() function is available to load a DataFrame from the path that is passed.

Questo esercizio fa parte del corso

ETL and ELT in Python

Visualizza il corso

Istruzioni dell'esercizio

  • Use the extract() function to load the DataFrame stored in the "sales_data.parquet" path.
  • Update the transform() function to return all rows and columns with "Quantity Ordered" greater than 1.
  • Further filter the clean_data DataFrame to only include columns "Order Date", "Quantity Ordered" and "Purchase Address".
  • Return the filtered DataFrame.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Extract data from the sales_data.parquet path
raw_sales_data = ____("sales_data.parquet")

def transform(raw_data):
  	# Only keep rows with `Quantity Ordered` greater than 1
    clean_data = raw_data.____[____, :]
    
    # Only keep columns "Order Date", "Quantity Ordered", and "Purchase Address"
    clean_data = ____
    
    # Return the filtered DataFrame
    return ____
    
transform(raw_sales_data)
Modifica ed esegui il codice