Exercise 17 - Comparing Within-Poll and Between-Poll Variability
We compute statistic called the t-statistic by dividing our estimate of \(b_2-b_1\) by its estimated standard error:
$$ \frac{\bar{Y}_2 - \bar{Y}_1}{\sqrt{s_2^2/N_2 + s_1^2/N_1}} $$ Later we learn will learn of another approximation for the distribution of this statistic for values of \(N_2\) and \(N_1\) that aren't large enough for the CLT.
Note that our data has more than two pollsters. We can also test for pollster effect using all pollsters, not just two. The idea is to compare the variability across polls to variability within polls. We can construct statistics to test for effects and approximate their distribution. The area of statistics that does this is called Analysis of Variance or ANOVA. We do not cover it here, but ANOVA provides a very useful set of tools to answer questions such as: is there a pollster effect?
Compute the average and standard deviation for each pollster and examine the variability across the averages and how it compares to the variability within the pollsters, summarized by the standard deviation.
This exercise is part of the course
HarvardX Data Science Module 4 - Inference and Modeling
Exercise instructions
- Group the
polls
data by pollster. - Summarize the average and standard deviation of the spreads for each pollster.
- Create an object called
var
that contains three columns: pollster, mean spread, and standard deviation. - Be sure to name the column for mean
avg
and the column for standard deviations
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Execute the following lines of code to filter the polling data and calculate the spread
polls <- polls_us_election_2016 %>%
filter(enddate >= "2016-10-15" &
state == "U.S.") %>%
group_by(pollster) %>%
filter(n() >= 5) %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100) %>%
ungroup()
# Create an object called `var` that contains columns for the pollster, mean spread, and standard deviation. Print the contents of this object to the console.