Validating data loaded to a Postgres Database
In this exercise, you'll finally get to build a data pipeline from end-to-end. This pipeline will extract school testing scores from a JSON file and transform the data to drop rows with missing scores. In addition to this, each will be ranked by the city they are located in, based on their total scores. Finally, the transformed dataset will be stored in a Postgres database.
To give you a head start, the extract()
and transform()
functions have been built and used as shown below. In addition to this, pandas
has been imported as pd
. Best of luck!
# Extract and clean the testing scores.
raw_testing_scores = extract("testing_scores.json")
cleaned_testing_scores = transform(raw_testing_scores)
This exercise is part of the course
ETL and ELT in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def load(clean_data, con_engine):
# Store the data in the schools database
clean_data.____(
name="scores_by_city",
con=con_engine,
____="____", # Make sure to replace existing data
index=True,
index_label="school_id"
)