Null sampling distribution of the slope
In the previous chapter, you investigated the sampling distribution of the slope from a population where the slope was non-zero. Typically, however, to do inference, you will need to know the sampling distribution of the slope under the hypothesis that there is no relationship between the explanatory and response variables. Additionally, in most situations, you don't know the population from which the data came, so the null sampling distribution must be derived from only the original dataset.
In the mid-20th century, a study was conducted that tracked down identical twins that were separated at birth: one child was raised in the home of their biological parents and the other in a foster home. In an attempt to answer the question of whether intelligence is the result of nature or nurture, both children were given IQ tests. The resulting data is given for the IQs of the foster twins (Foster
is the response variable) and the IQs of the biological twins (Biological
is the explanatory variable).
In this exercise you'll use the pull()
function. This function takes a data frame and returns a selected column as a vector (similar to $
).
This exercise is part of the course
Inference for Linear Regression in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
library(infer)
# Calculate the observed slope
# Run a lin. reg. of Foster vs. Biological on the twins data
obs_slope <- ___(___, ___) %>%
# Tidy the result
___() %>%
# Filter for rows where term equal Biological
___(___) %>%
# Pull out the estimate column
___(___)
# See the result
obs_slope