Validating data loaded to a Postgres Database
In this exercise, you'll finally get to build a data pipeline from end-to-end. This pipeline will extract school testing scores from a JSON file and transform the data to drop rows with missing scores. In addition to this, each will be ranked by the city they are located in, based on their total scores. Finally, the transformed dataset will be stored in a Postgres database.
To give you a head start, the extract() and transform() functions have been built and used as shown below. In addition to this, pandas has been imported as pd. Best of luck!
# Extract and clean the testing scores.
raw_testing_scores = extract("testing_scores.json")
cleaned_testing_scores = transform(raw_testing_scores)
Deze oefening maakt deel uit van de cursus
ETL and ELT in Python
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
def load(clean_data, con_engine):
# Store the data in the schools database
clean_data.____(
name="scores_by_city",
con=con_engine,
____="____", # Make sure to replace existing data
index=True,
index_label="school_id"
)