CommencerCommencer gratuitement

Simple random sampling

The simplest method of sampling a population is the one you've seen already. It is known as simple random sampling (sometimes abbreviated to "SRS"), and involves picking rows at random, one at a time, where each row has the same chance of being picked as any other.

To make it easier to see which rows end up in the sample, it's helpful to include a row ID column in the dataset before you take the sample.

In this chapter, we'll look at sampling methods using a synthetic (fictional) employee attrition dataset from IBM, where "attrition" means leaving the company.

attrition_pop is available; dplyr is loaded.

Cet exercice fait partie du cours

Sampling in R

Afficher le cours

Instructions

  • View the attrition_pop dataset. Explore it in the viewer until you are clear on what it contains.
  • Set the random seed to a value of your choosing.
  • Add a row ID column to the dataset, then use simple random sampling to get 200 rows.
  • View the sample dataset, attrition_samp. What do you notice about the row IDs?

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# View the attrition_pop dataset
___

# Set the seed
___

attrition_samp <- attrition_pop %>% 
  # Add a row ID column
  ___ %>% 
  # Get 200 rows using simple random sampling
  ___

# View the attrition_samp dataset
___
Modifier et exécuter le code