Session Ready
Exercise

Simple random sampling

The simplest method of sampling a population is the one you've seen already. It is known as simple random sampling (sometimes abbreviated to "SRS"), and involves picking rows at random, one at a time, where each row has the same chance of being picked as any other.

To make it easier to see which rows end up in the sample, it's helpful to include a row ID column in the dataset before you take the sample.

In this chapter, we'll look at sampling methods using a synthetic (fictional) employee attrition dataset from IBM, where "attrition" means leaving the company.

attrition_pop is available; dplyr is loaded.

Instructions
100 XP
  • View the attrition_pop dataset. Explore it in the viewer until you are clear on what it contains.
  • Set the random seed to a value of your choosing.
  • Add a row ID column to the dataset, then use simple random sampling to get 200 rows.
  • View the sample dataset, attrition_samp. What do you notice about the row IDs?