Get startedGet started for free

A little more on subsetting

It's often useful to extract all individuals (cases) in a data frame that have specific characteristics. You can accomplish this through conditioning commands.

First, consider expressions like cdc$gender == "m" or cdc$age > 30 (try them in the console!). These commands produce a series of TRUE and FALSE values. There is one value for each respondent, where TRUE indicates that the person was male or older than 30, respectively.

Suppose now you want to extract just the data for the men in the sample, or just for those over 30. You can simply use subset to do that. For example, the command

subset(cdc, cdc$gender == "m")

will return a data frame that only contains the men from the cdc data frame. (Note the double equal sign!)

This exercise is part of the course

Data Analysis and Statistical Inference

View Course

Exercise instructions

  • Use the subset() function to assign the subset of people in very good general health to very_good.
  • Assign the subset of people aged over 50 (excluding 50) to age_gt50.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# The cdc data frame is already loaded into the workspace

# Create the subsets
very_good <-
age_gt50 <-

# Print the subsets
head(very_good)
head(age_gt50)
Edit and Run Code