1. Bootstrap CI for difference in two means
A natural next step in your analysis would be to quantify the difference between the two population means using a confidence interval. This is a short video outlining the bootstrap scheme for estimating the difference between the two numerical population parameters. The following exercises will reveal that implementing this scheme in R takes only a few tweaks to the pipelines using the infer package that we have been using for doing simulation based inference.
2. Bootstrap CI for a difference
Constructing a bootstrap interval for the difference in two means is quite similar to constructing a bootstrap interval for a single mean. The only difference is that now we have two samples to bootstrap from.
So, we take bootstrap samples of each sample. That is, a random sample taken with replacement from each of the original samples, of the same size as each of the original samples.
Then, we calculate the bootstrap statistic. This is whatever we are interested in: a difference in means, medians, etc. of the bootstrap samples we generated. We record this value.
We then repeat steps 1 and 2 many times to create a bootstrap distribution.
Lastly, we calculate the interval using the percentile or the standard error method we learned earlier in the course.
3. Let's practice!
Now let's try some exercises.