Simple random sampling

The simplest method of sampling a population is the one you've seen already. It is known as simple random sampling (sometimes abbreviated to "SRS"), and involves picking rows at random, one at a time, where each row has the same chance of being picked as any other.

To make it easier to see which rows end up in the sample, it's helpful to include a row ID column in the dataset before you take the sample.

In this chapter, we'll look at sampling methods using a synthetic (fictional) employee attrition dataset from IBM, where "attrition" means leaving the company.

attrition_pop is available; dplyr is loaded.

This exercise is part of the course

Sampling in R

View Course

Exercise instructions

View the attrition_pop dataset. Explore it in the viewer until you are clear on what it contains.
Set the random seed to a value of your choosing.
Add a row ID column to the dataset, then use simple random sampling to get 200 rows.
View the sample dataset, attrition_samp. What do you notice about the row IDs?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# View the attrition_pop dataset
___

# Set the seed
___

attrition_samp <- attrition_pop %>% 
  # Add a row ID column
  ___ %>% 
  # Get 200 rows using simple random sampling
  ___

# View the attrition_samp dataset
___

Edit and Run Code