Validating data loaded to a Postgres Database

In this exercise, you'll finally get to build a data pipeline from end-to-end. This pipeline will extract school testing scores from a JSON file and transform the data to drop rows with missing scores. In addition to this, each will be ranked by the city they are located in, based on their total scores. Finally, the transformed dataset will be stored in a Postgres database.

To give you a head start, the extract() and transform() functions have been built and used as shown below. In addition to this, pandas has been imported as pd. Best of luck!

# Extract and clean the testing scores.
raw_testing_scores = extract("testing_scores.json")
cleaned_testing_scores = transform(raw_testing_scores)

Update the load() function to write the clean_data DataFrame to the scores_by_city table in the schools database.
If data exists in the scores_by_city table, makes sure to replace it with the updated data.

Introduction to Data Pipelines

Building ETL Pipelines

Advanced ETL Techniques

Deploying and Maintaining a Data Pipeline

Exercise

Validating data loaded to a Postgres Database

Instructions 1/2