Simple random sampling
The simplest method of sampling a population is the one you've seen already. It is known as simple random sampling (sometimes abbreviated to "SRS"), and involves picking rows at random, one at a time, where each row has the same chance of being picked as any other.
To make it easier to see which rows end up in the sample, it's helpful to include a row ID column in the dataset before you take the sample.
In this chapter, we'll look at sampling methods using a synthetic (fictional) employee attrition dataset from IBM, where "attrition" means leaving the company.
attrition_pop
is available; dplyr
is loaded.
Diese Übung ist Teil des Kurses
Sampling in R
Anleitung zur Übung
- View the
attrition_pop
dataset. Explore it in the viewer until you are clear on what it contains. - Set the random seed to a value of your choosing.
- Add a row ID column to the dataset, then use simple random sampling to get 200 rows.
- View the sample dataset,
attrition_samp
. What do you notice about the row IDs?
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# View the attrition_pop dataset
___
# Set the seed
___
attrition_samp <- attrition_pop %>%
# Add a row ID column
___ %>%
# Get 200 rows using simple random sampling
___
# View the attrition_samp dataset
___