Randomizing gender discrimination
Recall that we are considering a situation where the number of men and women are fixed (representing the resumes) and the number of people promoted is fixed (the managers were able to promote only 35 individuals).
In this exercise, you'll create a randomization distribution of the null statistic with 1000 replicates as opposed to just 5 in the previous exercise. As a reminder, the statistic of interest is the difference in proportions promoted between genders (i.e. proportion for males minus proportion for females). From the original dataset, you can calculate how the promotion rates differ between males and females. Using the specify-hypothesis-generate-calculate workflow in infer
, you can calculate the same statistic, but instead of getting a single number, you get a whole distribution. In this exercise, you'll compare that single number from the original dataset to the distribution made by the simulation.
This exercise is part of the course
Foundations of Inference in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Calculate the observed difference in promotion rate
diff_orig <- disc %>%
# Group by sex
group_by(___) %>%
# Summarize to calculate fraction promoted
___(prop_prom = ___(___)) %>%
# Summarize to calculate difference
___(stat = ___(___)) %>%
pull()
# See the result
diff_orig