1. Learn
  2. /
  3. Courses
  4. /
  5. Sampling in Python

Exercise

3 kinds of sampling

You're going to compare the performance of point estimates using simple, stratified, and cluster sampling. Before doing that, you'll have to set up the samples.

You'll use the RelationshipSatisfaction column of the attrition_pop dataset, which categorizes the employee's relationship with the company. It has four levels: Low, Medium, High, and Very_High. pandas has been loaded with its usual alias, and the random package has been loaded.

Instructions 1/3

undefined XP
  • 1
    • Perform simple random sampling on attrition_pop to get one-quarter of the population, setting the seed to 2022.
  • 2
    • Perform stratified sampling on attrition_pop to sample one-quarter of each RelationshipSatisfaction group, setting the seed to 2022.
  • 3
    • Create a list of unique values from attrition_pop's RelationshipSatisfaction column.
    • Randomly sample satisfaction_unique to get two values.
    • Subset the population for rows where RelationshipSatisfaction is in satisfaction_samp and clear any unused categories from RelationshipSatisfaction; assign to attrition_clust_prep.
    • Perform cluster sampling on the selected satisfaction groups, sampling one quarter of the population and setting the seed to 2022.