Step-by-step through the permutation
To help you understand the code used to create the randomization distribution, this exercise will walk you through the steps of the infer
framework. In particular, you'll see how differences in the generated replicates affect the calculated statistics.
After running the infer
steps, be sure to notice that the numbers are slightly different for each replicate.
This exercise is part of the course
Foundations of Inference in R
Exercise instructions
The dplyr
and infer
packages have been loaded for you, along with the disc
data frame from the last exercise.
- Call the functions for the first three steps. The work has been done for you, your job is to investigate the results of calling the first three
infer
steps. - In order to see the effect of permuting,
- group the permuted data frame,
disc_perm
, by the newreplicate
variable, then - count the variables of interest (
promote
within eachsex
) usingcount()
.
- group the permuted data frame,
- Using
disc_perm
,calculate()
the statistic of interest. Setstat
to"diff in props"
andorder
toc("male", "female")
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Replicate the entire data frame, permuting the promote variable
disc_perm <- disc %>%
specify(promote ~ sex, success = "promoted") %>%
hypothesize(null = "independence") %>%
generate(reps = 5, type = "permute")
disc_perm %>%
# Group by replicate
___ %>%
# Count per group
___
disc_perm %>%
# Calculate difference in proportion, male then female
___