Sample size in randomization distribution

We've created two new datasets for you with essentially the same difference in proportions as the original discrimination data. However, one of the datasets (disc_small) is one third the size of the original dataset and the other (disc_big) is 10 times larger than the original dataset.

Additionally, the same permutation code used previously has been run on the small and big datasets to create small and big distributions of permuted differences in promotion rates (disc_small_perm and disc_big_perm, respectively).

In this exercise, you'll use these two new distributions to get a sense for how the differences vary given widely different sample sizes. In particular, notice the range of variability on the x-axis of each plot.

1
- Tabulate the small dataset, disc_small. That is, call count(), passing the sex and promote columns, to get a contingency table.
- Do the same with the big dataset, disc_big.

2
- Using the small permutation dataset, disc_perm_small, plot stat.
- Add a histogram layer with binwidth 0.01.
- Add a vertical line using geom_vline(), with x-axis intercept diff_orig_small.
3
Draw the same plot again, this time using the big dataset, disc_perm_big and an x-axis intercept of diff_orig_big.

Introduction to ideas of inference

Completing a randomization test: gender discrimination

Hypothesis testing errors: opportunity cost

Confidence intervals

Exercise

Sample size in randomization distribution

Instructions 1/3