Validating data loaded to a Postgres Database

In this exercise, you'll finally get to build a data pipeline from end-to-end. This pipeline will extract school testing scores from a JSON file and transform the data to drop rows with missing scores. In addition to this, each will be ranked by the city they are located in, based on their total scores. Finally, the transformed dataset will be stored in a Postgres database.

To give you a head start, the extract() and transform() functions have been built and used as shown below. In addition to this, pandas has been imported as pd. Best of luck!

# Extract and clean the testing scores.
raw_testing_scores = extract("testing_scores.json")
cleaned_testing_scores = transform(raw_testing_scores)

Deze oefening maakt deel uit van de cursus

ETL and ELT in Python

Cursus bekijken

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

def load(clean_data, con_engine):
	# Store the data in the schools database
    clean_data.____(
    	name="scores_by_city",
		con=con_engine,
		____="____",  # Make sure to replace existing data
		index=True,
		index_label="school_id"
    )

Code bewerken en uitvoeren