Empirical Rule
Many statistics we use in data analysis (including both the sample average and sample proportion) have nice properties that are used to better understand the population parameter(s) of interest.
One such property is that if the variability of the sample proportion (called the standard error, or \(SE\)) is known, then approximately 95% of \(\hat{p}\) values (from different samples) will be within \(2SE\) of the true population proportion.
To check whether that holds in the situation at hand, let's go back to the polls generated by taking many samples from the same population.
The all_polls
dataset contains 1000 samples of size 30 from a population with a probability of voting for Candidate X equal to 0.6.
Note that you will use the R function sd()
which calculates the variability of any set of numbers. In statistics, when sd()
is applied to a variable (e.g., price of house) we call it the standard deviation. When sd()
is applied to a statistic (e.g., set of sample proportions) we call it the standard error.
This exercise is part of the course
Foundations of Inference in R
Exercise instructions
- Run the code to generate
props
, the proportion of individuals who are planning to vote yes in each poll. This is based uponex1_props
from previous exercises. - Add a column,
is_in_conf_int
that isTRUE
when the sampled proportion of yes votes is less than2
standard errors away from the true population proportion of yes votes. That is, theabs()
solute difference betweenprop_yes
andtrue_prop_yes
is less than twicesd()
ofprop_yes
. - Calculate the proportion of sample statistics in the confidence interval,
prop_in_conf_int
, by taking themean()
ofis_in_conf_int
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Proportion of yes votes by poll
props <- all_polls %>%
group_by(poll) %>%
summarize(prop_yes = mean(vote == "yes"))
# The true population proportion of yes votes
true_prop_yes <- 0.6
# Proportion of polls within 2SE
props %>%
# Add column: is prop_yes in 2SE of 0.6
mutate(is_in_conf_int = ___(___ - ___) < ___ * ___(___)) %>%
# Calculate proportion in conf int
summarize(prop_in_conf_int = ___(___))