Session Ready
Exercise

Exercise 17 - Comparing Within-Poll and Between-Poll Variability

We compute statistic called the t-statistic by dividing our estimate of \(b_2-b_1\) by its estimated standard error:

$$ \frac{\bar{Y}_2 - \bar{Y}_1}{\sqrt{s_2^2/N_2 + s_1^2/N_1}} $$ Later we learn will learn of another approximation for the distribution of this statistic for values of \(N_2\) and \(N_1\) that aren't large enough for the CLT.

Note that our data has more than two pollsters. We can also test for pollster effect using all pollsters, not just two. The idea is to compare the variability across polls to variability within polls. We can construct statistics to test for effects and approximate their distribution. The area of statistics that does this is called Analysis of Variance or ANOVA. We do not cover it here, but ANOVA provides a very useful set of tools to answer questions such as: is there a pollster effect?

Compute the average and standard deviation for each pollster and examine the variability across the averages and how it compares to the variability within the pollsters, summarized by the standard deviation.

Instructions
100 XP
  • Group the polls data by pollster.
  • Summarize the average and standard deviation of the spreads for each pollster.
  • Create an object called var that contains three columns: pollster, mean spread, and standard deviation.
  • Be sure to name the column for mean avg and the column for standard deviation s.