Session Ready
Exercise

Null sampling distribution of the slope

In the previous chapter, you investigated the sampling distribution of the slope from a population where the slope was non-zero. Typically, however, to do inference, you will need to know the sampling distribution of the slope under the hypothesis that there is no relationship between the explanatory and response variables. Additionally, in most situations, you don't know the population from which the data came, so the null sampling distribution must be derived from only the original dataset.

In the mid-20th century, a study was conducted that tracked down identical twins that were separated at birth: one child was raised in the home of their biological parents and the other in a foster home. In an attempt to answer the question of whether intelligence is the result of nature or nurture, both children were given IQ tests. The resulting data is given for the IQs of the foster twins (Foster is the response variable) and the IQs of the biological twins (Biological is the explanatory variable).

In this exercise you'll use the pull() function. This function takes a data frame and returns a selected column as a vector (similar to $).

Instructions 1/2
undefined XP
  • 1
    • Run a linear regression of Foster vs. Biological on the twins dataset.
    • Tidy the result.
    • Filter for rows where term equals "Biological".
    • Use pull() to pull out the estimate column.
    • 2

      Simulate 10 slopes.

      • Use specify() to specify Foster vs. Biological (same formula as for a linear regression).
      • Use hypothesize(), to set a null hypothesis of "independence".
      • Use generate() to generate 10 replicates (reps) of type "permute".
      • Use calculate() to calculate the summary statistic "slope".