Get startedGet started for free

Empirical Rule

Many statistics we use in data analysis (including both the sample average and sample proportion) have nice properties that are used to better understand the population parameter(s) of interest.

One such property is that if the variability of the sample proportion (called the standard error, or \(SE\)) is known, then approximately 95% of \(\hat{p}\) values (from different samples) will be within \(2SE\) of the true population proportion.

To check whether that holds in the situation at hand, let's go back to the polls generated by taking many samples from the same population.

The all_polls dataset contains 1000 samples of size 30 from a population with a probability of voting for Candidate X equal to 0.6.

Note that you will use the R function sd() which calculates the variability of any set of numbers. In statistics, when sd() is applied to a variable (e.g., price of house) we call it the standard deviation. When sd() is applied to a statistic (e.g., set of sample proportions) we call it the standard error.

This exercise is part of the course

Foundations of Inference in R

View Course

Exercise instructions

  • Run the code to generate props, the proportion of individuals who are planning to vote yes in each poll. This is based upon ex1_props from previous exercises.
  • Add a column, is_in_conf_int that is TRUE when the sampled proportion of yes votes is less than 2 standard errors away from the true population proportion of yes votes. That is, the abs()solute difference between prop_yes and true_prop_yes is less than twice sd() of prop_yes.
  • Calculate the proportion of sample statistics in the confidence interval, prop_in_conf_int, by taking the mean() of is_in_conf_int.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Proportion of yes votes by poll
props <- all_polls %>% 
  group_by(poll) %>% 
  summarize(prop_yes = mean(vote == "yes"))

# The true population proportion of yes votes
true_prop_yes <- 0.6

# Proportion of polls within 2SE
props %>%
  # Add column: is prop_yes in 2SE of 0.6
  mutate(is_in_conf_int = ___(___ - ___) < ___ * ___(___)) %>%
  # Calculate  proportion in conf int
  summarize(prop_in_conf_int = ___(___))
Edit and Run Code