Joining tables
Tables in relational databases usually have key columns of unique record identifiers. This lets us build pipelines that combine tables using SQL's JOIN
operation, instead of having to combine data after importing it.
The records in hpd311calls
often concern issues, like leaks or heating problems, that are exacerbated by weather conditions. In this exercise, you'll join weather
data to call records along their common date columns to get everything in one dataframe. You can assume these columns have the same data type.
pandas
is loaded as pd
, and the database engine, engine
, has been created.
Note: The SQL checker is picky about join table order -- it expects specific tables on the left and the right.
This exercise is part of the course
Streamlined Data Ingestion with pandas
Exercise instructions
- Complete the query to join
weather
tohpd311calls
by theirdate
andcreated_date
columns, respectively. - Query the database and assign the resulting dataframe to
calls_with_weather
. - Print the first few rows of
calls_with_weather
to confirm all columns were joined.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Query to join weather to call records by date columns
query = """
SELECT *
FROM hpd311calls
JOIN ____
ON hpd311calls.____ = ____.____;
"""
# Create dataframe of joined tables
calls_with_weather = ____
# View the dataframe to make sure all columns were joined
____