Dropping rows
When you know that a specific column will be critical to your analysis, and only a small fraction of rows are missing a value in that column, it often makes sense to remove those rows from the dataset.
During this course, the driver_gender
column will be critical to many of your analyses. Because only a small fraction of rows are missing driver_gender
, we'll drop those rows from the dataset.
This exercise is part of the course
Analyzing Police Activity with pandas
Exercise instructions
- Count the number of missing values in each column.
- Drop all rows that are missing
driver_gender
by passing the column name to thesubset
parameter of.dropna()
. - Count the number of missing values in each column again, to verify that none of the remaining rows are missing
driver_gender
. - Examine the DataFrame's
.shape
to see how many rows and columns remain.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Count the number of missing values in each column
print(ri.isnull().____)
# Drop all rows that are missing 'driver_gender'
ri.____(subset=[____], inplace=True)
# Count the number of missing values in each column (again)
print(ri.____.____)
# Examine the shape of the DataFrame
print(____)