Selecting columns with SQL
Datasets can contain columns that are not required for an analysis, like the weather
table in data.db
does. Some, such as elevation, are redundant, since all observations occurred at the same place, while others contain variables we are not interested in. After making a database engine, you'll write a query to SELECT
only the date and temperature columns, and pass both to read_sql()
to make a dataframe of high and low temperature readings.
pandas
has been loaded as pd
, and create_engine()
has been imported from sqlalchemy
.
Note: The SQL checker is quite picky about column positions and expects fields to be selected in the specified order.
This exercise is part of the course
Streamlined Data Ingestion with pandas
Exercise instructions
- Create a database engine for
data.db
. - Write a SQL query that
SELECT
s thedate
,tmax
, andtmin
columns from theweather
table. - Make a dataframe by passing the query and engine to
read_sql()
and assign the resulting dataframe totemperatures
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create database engine for data.db
engine = ____
# Write query to get date, tmax, and tmin from weather
query = """
SELECT ____,
____,
____
FROM ____;
"""
# Make a dataframe by passing query and engine to read_sql()
temperatures = ____
# View the resulting dataframe
print(temperatures)