1. Simulation-based Inference
In this chapter, we'll be working with twin data collected in the mid 20th century by Cyril Burt. He provides interesting data with which to work, but his research is the subject of some controversy, including the possibility that the data were falsified. But let's work with the data anyway.
2. Twins - original data
The twin study tracked down identical twins who were separated at birth: one child was raised in the home of their biological parents and the other in a foster home. In an attempt to answer the question of whether intelligence is the result of nature or nurture, both children were given IQ tests.
The resulting data is given for the IQs of the foster twins (`Foster` is the response variable) and the IQs of the biological twins (`Biological` is the explanatory variable).
3. Pairs of twins
Note that the Biological twin with the highest IQ is paired with the Foster twin with the highest IQ. Also, the Biological twin with the lowest IQ is paired with the Foster twin with the lowest IQ. If "nature" is playing a strong role here, then we would expect to see such a pattern. If "nature" is not playing a strong role, we would think that the observed pattern is simply due to chance.
What if we permuted the Foster twins such that they were randomly matched-up with each Biological twin? Would we see the same linear pattern? Let's see...
4. Twin data
The dataset itself can be thought of as two columns. Note that the two columns represent the IQ value of the biological twin and the IQ value of the foster twin. As evidenced by the graphic, the IQ values are *paired*. That is, each of the two IQ values come from one pair of twins.
5. Permuted twin data
This figure now demonstrates the act of permuting the foster twins so that the IQ values are the same numbers but they are not matched to their biological twins.
6. Permuted data (1) plotted
The permuted data are plotted in the same way that the original data were plotted. Notice that the permuted data do not have any of the structure given in the original data. That is because in the original data, each point represents a pair of twins. In the permuted data, the twins have been mixed up, so the points no longer represent one family unit.
7. Permuted data (2) plotted
A different permutation shows the same idea: when each point is no longer associated with one *pair* of twins, there is not an association between the IQ values.
8. Permuted data (1) and (2)
Note that the two permuted datasets are not identical. Indeed, by permuting the data, we get information about how the line would *vary* if there was not a relationship between IQs in the biological and foster kids. That is, how does the line vary if the null hypothesis is true and the slope is actually zero?
9. Linear model on permuted data
Using the infer package, we repeatedly sample the response variable so that any pattern in the linear model is due to random chance (and not an underlying relationship). As with the infer package on other statistics, the steps used here are:
1. specify the linear model that we are predicting Foster IQ from Biological IQ
2. provide the null hypothesis that the two variables are independent
3. describe how to generate the sampling distribution, here it is done through permuting the biological IQ ten times
4. calculate the statistic of interest, here the slope
As you can see, sometimes the slope of the permuted data is positive, sometimes it is negative.
10. Many permuted slopes
The permuted slopes can be visualized in a histogram. We see that the permuted slopes are centered around zero and vary by approximately plus or minus 0.5.
11. Permuted slopes with observed slope in red
The observed slope was 0.9 which is no where close to the values obtained by permuting the data. The comparison allows us to know that the observed slope would not have been obtained just by one chance permutation if the null hypothesis was true.
Notice in the R code that the value of the observed slope is given by using the tidy linear model code with the pull() function.
12. Let's practice!
Thanks for following along with this video, now it is your turn to practice!