Exercise 9. Comparing to actual results by pollster - multiple polls
Remake the plot you made for the previous exercise, but only for pollsters that took five or more polls.
You can use dplyr tools group_by
and n
to group data by a variable of interest and then count the number of observations in the groups. The function filter
filters data piped into it by your specified condition.
For example:
data %>% group_by(variable_for_grouping)
%>% filter(n() >= 5)
This exercise is part of the course
HarvardX Data Science Module 4 - Inference and Modeling
Exercise instructions
- Define a new variable
errors
that contains the difference between the estimated difference between the proportion of voters and the actual difference on election day, 0.021. - Group the data by pollster using the
group_by
function. - Filter the data by pollsters with 5 or more polls.
- Use
ggplot
to create the plot of errors by pollster. - Add a layer with the function
geom_point
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# The `polls` object has already been loaded. Examine it using the `head` function.
head(polls)
# Add variable called `error` to the object `polls` that contains the difference between d_hat and the actual difference on election day. Then make a plot of the error stratified by pollster, but only for pollsters who took 5 or more polls.