Constructing a CI

You've seen one example of how p-hat can vary upon resampling, but we need to do this many many times to get a good estimate of its variability. Here you will compute a full bootstrap distribution to estimate the standard error (SE) that will be used to form a confidence interval. You'll use an additional verb from infer, calculate(), to streamline this process of calculating many statistics from many datasets.

Take a moment to inspect the output of calculate. This function reduces your data frame to just two columns: one for the "stat"s and another for the "replicate" they correspond to.

When you plot your bootstrap distribution, you'll find that it's bell-shaped. It's this shape that allows you to add and subtract two SEs to get a 95% interval.

Cet exercice fait partie du cours

Inference for Categorical Data in R

Afficher le cours

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Create bootstrap distribution for proportion with High conf
boot_dist <- gss2016 %>%
  # Specify the response and success
  specify(response = ___, ___ = "___") %>%
  # Generate 500 bootstrap reps
  generate(___ = ___, type = "bootstrap") %>%
  # Calculate proportions
  calculate(stat = "___")

# See the result
boot_dist

Modifier et exécuter le code