Resampling from a sample
To investigate how much the estimates of a population proportion change from sample to sample, you will set up two sampling experiments.
In the first experiment, you will simulate repeated samples from a population. In the second, you will choose a single sample from the first experiment and repeatedly resample from that sample: a method called bootstrapping. More specifically:
Experiment 1: Assume the true proportion of people who will vote for Candidate X is 0.6. Repeatedly sample 30 people from the population and measure the variability of \(\hat{p}\) (the sample proportion).
Experiment 2: Take one sample of size 30 from the same population. Repeatedly sample 30 people (with replacement!) from the original sample and measure the variability of \(\hat{p}^*\) (the resample proportion).
It's important to realize that the first experiment relies on knowing the population and is typically impossible in practice. The second relies only on the sample of data and is therefore easy to implement for any statistic. Fortunately, as you will see, the variability in \(\hat{p}\), or the proportion of "successes" in a sample, is approximately the same whether we sample from the population or resample from a sample.
We have created 1000 random samples, each of size 30, from the population. The resulting data frame, all_polls
, is available in your workspace. Take a look before getting started.
This exercise is part of the course
Foundations of Inference in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Compute p-hat for each poll
ex1_props <- all_polls %>%
# Group by poll
___(___) %>%
# Calculate proportion of yes votes
___(stat = ___(___))
# Review the result
ex1_props