Exercise 3 - Stratify by Pollster and Grade
Now find the proportion of hits for each pollster. Show only pollsters with at least 5 polls and order them from best to worst. Show the number of polls conducted by each pollster and the FiveThirtyEight grade of each pollster.
This exercise is part of the course
HarvardX Data Science Module 4 - Inference and Modeling
Exercise instructions
- Create an object called
p_hits
that contains the proportion of intervals that contain the actual spread using the following steps. - Use the
mutate
function to create a new variable calledhit
that contains a logical vector for whether theactual_spread
falls between thelower
andupper
confidence intervals. - Use the
group_by
function to group the data by pollster. - Use the
filter
function to filter for pollsters that have at least 5 polls. - Summarize the proportion of values in
hit
that are true as a variable calledproportion_hits
. Also create new variables for the number of polls by each pollster (n
) using then()
function and the grade of each poll (grade
) by taking the first row of the grade column. - Use the
arrange
function to arrange theproportion_hits
in descending order.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# The `cis` data have already been loaded for you
add <- results_us_election_2016 %>% mutate(actual_spread = clinton/100 - trump/100) %>% select(state, actual_spread)
ci_data <- cis %>% mutate(state = as.character(state)) %>% left_join(add, by = "state")
# Create an object called `p_hits` that summarizes the proportion of hits for each pollster that has at least 5 polls.