Constructing a CI
You've seen one example of how p-hat can vary upon resampling, but we need to do this many many times to get a good estimate of its variability. Here you will compute a full bootstrap distribution to estimate the standard error (SE) that will be used to form a confidence interval. You'll use an additional verb from infer, calculate()
, to streamline this process of calculating many statistics from many datasets.
Take a moment to inspect the output of calculate. This function reduces your data frame to just two columns: one for the "stat"s and another for the "replicate" they correspond to.
When you plot your bootstrap distribution, you'll find that it's bell-shaped. It's this shape that allows you to add and subtract two SEs to get a 95% interval.
This exercise is part of the course
Inference for Categorical Data in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create bootstrap distribution for proportion with High conf
boot_dist <- gss2016 %>%
# Specify the response and success
specify(response = ___, ___ = "___") %>%
# Generate 500 bootstrap reps
generate(___ = ___, type = "bootstrap") %>%
# Calculate proportions
calculate(stat = "___")
# See the result
boot_dist