1. Variability of coefficients
In the previous example, you found the linear model output for a dataset regressing the volume of bicycle riders on the external high temperature. As already pointed out, the variability associated with the slope is given in the output as the `std.error`.
In the next few exercises, we will investigate what parts of the model drive the variability of the sampling distribution of the slope.
2. RailTrails -- original data
Using a dataset taken from a bike trail in Massachusetts (available in the mosaic package in R), we've plotted the high temperature for the day and the volume of bicycle riders on the trail.
The original regression of volume on high temperature shows a reasonably strong positive linear association. However, there is likely some variability associated with the difference between the sample and the population. The source of the sample variability is what we will investigate in the next few exercises.
3. RailTrails - a change in sample size
In the images above, we consider the RailTrail data again, but this time we've repeatedly sampled from the original data with smaller sample sizes (n=10 on the left and n=50 on the right). We can see that with very small sample sizes like n=10 the variability of the lines is much higher than with samples of size n=50. The original dataset has 90 observations and so is likely to be even less variable (in selecting from the population) than the image on the right.
4. RailTrails - less variability around the line
Here, the data themselves have been modified such that the "tighter" data is less variable around the line. When samples of size 50 are taken from the tighter data, each sample is mostly representative of the same model, and so the lines vary much less than the lines given by samples from the original data.
5. RailTrails - less variability in the x direction
Now the data have been modified to have fewer observations in the extreme range of the x-variable, high temperature. That is, there are no days in the low 50s or high 80s. The effect of a more narrow dataset is to cause the variability in the regression lines to increase. Somewhat counterintuitively, the variability in the slope increases as the variability in the high temperature decreases. This is because the extreme values of high temperature no longer act as an anchor for the model.
6. Let's practice!
Thanks for following along with this video. To practice, you will work with a hypothetical population. The population will change in specific ways that demonstrate when the sampling distribution is more or less variable. Now it is your turn to practice!