Step-by-step through the permutation
To help you understand the code used to create the randomization distribution, this exercise will walk you through the steps of the infer framework. In particular, you'll see how differences in the generated replicates affect the calculated statistics.
After running the infer steps, be sure to notice that the numbers are slightly different for each replicate.
This exercise is part of the course
Foundations of Inference in R
Exercise instructions
The dplyr and infer packages have been loaded for you, along with the disc data frame from the last exercise.
- Call the functions for the first three steps. The work has been done for you, your job is to investigate the results of calling the first three
infersteps. - In order to see the effect of permuting,
- group the permuted data frame,
disc_perm, by the newreplicatevariable, then - count the variables of interest (
promotewithin eachsex) usingcount().
- group the permuted data frame,
- Using
disc_perm,calculate()the statistic of interest. Setstatto"diff in props"andordertoc("male", "female").
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Replicate the entire data frame, permuting the promote variable
disc_perm <- disc %>%
specify(promote ~ sex, success = "promoted") %>%
hypothesize(null = "independence") %>%
generate(reps = 5, type = "permute")
disc_perm %>%
# Group by replicate
___ %>%
# Count per group
___
disc_perm %>%
# Calculate difference in proportion, male then female
___