Get startedGet started for free

Exercise 1 - Confidence Intervals of Polling Data

For each poll in the polling data set, use the CLT to create a 95% confidence interval for the spread. Create a new table called cis that contains columns for the lower and upper limits of the confidence intervals.

This exercise is part of the course

HarvardX Data Science Module 4 - Inference and Modeling

View Course

Exercise instructions

  • Use pipes %>% to pass the poll object on to the mutate function, which creates new variables.
  • Create a variable called X_hat that contains the estimate of the proportion of Clinton voters for each poll.
  • Create a variable called se that contains the standard error of the spread.
  • Calculate the confidence intervals using the qnorm function and your calculated se.
  • Use the select function to keep the following columns: state, startdate, enddate, pollster, grade, spread, lower, upper.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load the libraries and data
library(dplyr)
library(dslabs)
data("polls_us_election_2016")

# Create a table called `polls` that filters by  state, date, and reports the spread
polls <- polls_us_election_2016 %>% 
  filter(state != "U.S." & enddate >= "2016-10-31") %>% 
  mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)

# Create an object called `cis` that has the columns indicated in the instructions
Edit and Run Code