Get startedGet started for free

Exercise 9. Comparing to actual results by pollster - multiple polls

Remake the plot you made for the previous exercise, but only for pollsters that took five or more polls.

You can use dplyr tools group_by and n to group data by a variable of interest and then count the number of observations in the groups. The function filter filters data piped into it by your specified condition.

For example:

data %>% group_by(variable_for_grouping) 
    %>% filter(n() >= 5)

This exercise is part of the course

HarvardX Data Science Module 4 - Inference and Modeling

View Course

Exercise instructions

  • Define a new variable errors that contains the difference between the estimated difference between the proportion of voters and the actual difference on election day, 0.021.
  • Group the data by pollster using the group_by function.
  • Filter the data by pollsters with 5 or more polls.
  • Use ggplot to create the plot of errors by pollster.
  • Add a layer with the function geom_point.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# The `polls` object has already been loaded. Examine it using the `head` function.
head(polls)

# Add variable called `error` to the object `polls` that contains the difference between d_hat and the actual difference on election day. Then make a plot of the error stratified by pollster, but only for pollsters who took 5 or more polls.
Edit and Run Code