1. Refining imports with SQL queries
Now that you know how to connect to a database and query it with pandas, let's focus on refining imports with different SQL queries.
2. SELECTing Columns
pandas' read SQL function has fewer arguments than read Excel or read CSV, but SQL lets you customize data imports in even more ways.
The last lesson mentioned one such way. Specifying columns in a select statement lets you load only variables you are interested in studying, akin to usecols in read Excel and read CSV.
The statement "select date, t average from weather", for example, gets only the date and average temperatures in the weather table.
3. WHERE Clauses
The other common way to selectively get data with SQL is by using a where clause to filter rows.
The syntax looks like "select columns from table where" a given condition or conditions are met.
4. Filtering by Numbers
To import records based on numeric values, we can use standard operators to compare numbers:
equals,
greater than and greater than or equal to,
less than and less than or equal to,
and not equal to. Note that SQL's equality and inequality operators differ from Python's.
The weather data contains Fahrenheit temperatures, so the SQL query to get all days with a high temperature above freezing would be "select star from weather where t max is greater than 32."
5. Filtering Text
Where clauses can also filter text. To match a string exactly, use the equal sign followed by the text to match in single quotes.
Note that string matching is case sensitive.
To get call records about incidents in the borough of Brooklyn, for example, the query would be "select star from hpd311calls where borough equals Brooklyn", with Brooklyn in single quotes.
6. SQL and pandas
Let's combine SQL and Python to get Brooklyn calls.
As before, we import pandas and SQLalchemy's create engine function.
We pass create engine the URL to data dot db and assign the result to engine.
Since our SQL queries are getting more complicated, we'll write out the query first and assign it to the variable "query." Wrapping the string in triple quotes lets us split it between multiple lines so it's easier to read.
Then we pass the query and engine to read SQL.
When we check the unique borough values in the resulting dataframe, we see there are only Brooklyn calls.
7. Combining Conditions: AND
We can even combine conditions with SQL's and and or operators.
And returns only records where all conditions are true.
For example, if we create the query "select star from hpd311calls where borough equals Bronx and complaint type equals plumbing",
then pass the query and engine to read SQL,
we get the 2,016 call records about plumbing from the Bronx.
8. Combining Conditions: OR
The or operator returns records with at least one met condition.
If we change our query to get records that are about plumbing or water leaks,
we get 10,684 records that meet one or both conditions.
9. Let's practice!
As you probably noticed, SQL is integral to building pipelines to databases, even when you're working in Python. Now, it's your turn to practice. Good luck!