Empirical Rule

Many statistics we use in data analysis (including both the sample average and sample proportion) have nice properties that are used to better understand the population parameter(s) of interest.

One such property is that if the variability of the sample proportion (called the standard error, or \(SE\)) is known, then approximately 95% of \(\hat{p}\) values (from different samples) will be within \(2SE\) of the true population proportion.

To check whether that holds in the situation at hand, let's go back to the polls generated by taking many samples from the same population.

The all_polls dataset contains 1000 samples of size 30 from a population with a probability of voting for Candidate X equal to 0.6.

Note that you will use the R function sd() which calculates the variability of any set of numbers. In statistics, when sd() is applied to a variable (e.g., price of house) we call it the standard deviation. When sd() is applied to a statistic (e.g., set of sample proportions) we call it the standard error.

Run the code to generate props, the proportion of individuals who are planning to vote yes in each poll. This is based upon ex1_props from previous exercises.
Add a column, is_in_conf_int that is TRUE when the sampled proportion of yes votes is less than 2 standard errors away from the true population proportion of yes votes. That is, the abs()solute difference between prop_yes and true_prop_yes is less than twice sd() of prop_yes.
Calculate the proportion of sample statistics in the confidence interval, prop_in_conf_int, by taking the mean() of is_in_conf_int.

Introduction to ideas of inference

Completing a randomization test: gender discrimination

Hypothesis testing errors: opportunity cost

Confidence intervals

Ejercicio

Empirical Rule

Instrucciones