Exercise 1 - Confidence Intervals of Polling Data
For each poll in the polling data set, use the CLT to create a 95% confidence interval for the spread. Create a new table called cis
that contains columns for the lower and upper limits of the confidence intervals.
This exercise is part of the course
HarvardX Data Science Module 4 - Inference and Modeling
Exercise instructions
- Use pipes
%>%
to pass thepoll
object on to themutate
function, which creates new variables. - Create a variable called
X_hat
that contains the estimate of the proportion of Clinton voters for each poll. - Create a variable called
se
that contains the standard error of the spread. - Calculate the confidence intervals using the
qnorm
function and your calculatedse
. - Use the
select
function to keep the following columns:state
,startdate
,enddate
,pollster
,grade
,spread
,lower
,upper
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load the libraries and data
library(dplyr)
library(dslabs)
data("polls_us_election_2016")
# Create a table called `polls` that filters by state, date, and reports the spread
polls <- polls_us_election_2016 %>%
filter(state != "U.S." & enddate >= "2016-10-31") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
# Create an object called `cis` that has the columns indicated in the instructions