Validating data loaded to a Postgres Database

In this exercise, you'll finally get to build a data pipeline from end-to-end. This pipeline will extract school testing scores from a JSON file and transform the data to drop rows with missing scores. In addition to this, each will be ranked by the city they are located in, based on their total scores. Finally, the transformed dataset will be stored in a Postgres database.

To give you a head start, the extract() and transform() functions have been built and used as shown below. In addition to this, pandas has been imported as pd. Best of luck!

# Extract and clean the testing scores.
raw_testing_scores = extract("testing_scores.json")
cleaned_testing_scores = transform(raw_testing_scores)

Questo esercizio fa parte del corso

ETL and ELT in Python

Visualizza il corso

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

def load(clean_data, con_engine):
	# Store the data in the schools database
    clean_data.____(
    	name="scores_by_city",
		con=con_engine,
		____="____",  # Make sure to replace existing data
		index=True,
		index_label="school_id"
    )

Modifica ed esegui il codice