A little more on subsetting
It's often useful to extract all individuals (cases) in a data frame that have specific characteristics. You can accomplish this through conditioning commands.
First, consider expressions like cdc$gender == "m"
or cdc$age > 30
(try them in the console!). These commands produce a series of TRUE
and FALSE
values. There is one value for each respondent, where TRUE
indicates that the person was male or older than 30, respectively.
Suppose now you want to extract just the data for the men in the sample, or just for those over 30. You can simply use subset
to do that. For example, the command
subset(cdc, cdc$gender == "m")
will return a data frame that only contains the men from the cdc
data frame. (Note the double equal sign!)
This exercise is part of the course
Data Analysis and Statistical Inference
Exercise instructions
- Use the
subset()
function to assign the subset of people in very good general health tovery_good
. - Assign the subset of people aged over 50 (excluding 50) to
age_gt50
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# The cdc data frame is already loaded into the workspace
# Create the subsets
very_good <-
age_gt50 <-
# Print the subsets
head(very_good)
head(age_gt50)