Subsetting rows by categorical variables
Subsetting data based on a categorical variable often involves using the "or" operator (|
) to select rows from multiple categories. This can get tedious when you want all states in one of three different regions, for example.
Instead, use the .isin()
method, which will allow you to tackle this problem by writing one condition instead of three separate ones.
colors = ["brown", "black", "tan"]
condition = dogs["color"].isin(colors)
dogs[condition]
homelessness
is available and pandas
is loaded as pd
.
This is a part of the course
“Data Manipulation with pandas”
Exercise instructions
Filter homelessness
for cases where the USA census state
is in the list of Mojave states, canu
, assigning to mojave_homelessness
. View the printed result.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# The Mojave Desert states
canu = ["California", "Arizona", "Nevada", "Utah"]
# Filter for rows in the Mojave Desert states
mojave_homelessness = homelessness[____]
# See the result
print(mojave_homelessness)
This exercise is part of the course
Data Manipulation with pandas
Learn how to import and clean data, calculate statistics, and create visualizations with pandas.
Let’s master the pandas basics. Learn how to inspect DataFrames and perform fundamental manipulations, including sorting rows, subsetting, and adding new columns.
Exercise 1: Introducing DataFramesExercise 2: Inspecting a DataFrameExercise 3: Parts of a DataFrameExercise 4: Sorting and subsettingExercise 5: Sorting rowsExercise 6: Subsetting columnsExercise 7: Subsetting rowsExercise 8: Subsetting rows by categorical variablesExercise 9: New columnsExercise 10: Adding new columnsExercise 11: Combo-attack!What is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.