Getting distinct values
Sometimes an analysis doesn't need every record, but rather unique values in one or more columns. Duplicate values can be removed after loading data into a dataframe, but it can also be done at import with SQL's DISTINCT
keyword.
Since hpd311calls
contains data about housing issues, we would expect most records to have a borough listed. Let's test this assumption by querying unique complaint_type
/borough
combinations.
pandas
has been imported as pd
, and the database engine has been created as engine
.
Note: The SQL checker is quite picky about column positions and expects fields to be selected in the specified order.
Este ejercicio forma parte del curso
Streamlined Data Ingestion with pandas
Instrucciones del ejercicio
- Create a query that gets
DISTINCT
values forborough
andcomplaint_type
(in that order) fromhpd311calls
. - Use
read_sql()
to load the results of the query to a dataframe,issues_and_boros
. - Print the dataframe to check if the assumption that all issues besides literature requests appear with boroughs listed.
Ejercicio interactivo práctico
Prueba este ejercicio completando el código de muestra.
# Create query for unique combinations of borough and complaint_type
query = """
SELECT ____ ____,
____
____ hpd311calls;
"""
# Load results of query to a dataframe
issues_and_boros = ____
# Check assumption about issues and boroughs
print(____)