1. Different types of intervals
Throughout the different introductory courses, you have repeatedly seen confidence intervals which attempt to capture the true population parameter of interest. In the previous exercises, you created confidence intervals for the true slope and intercept parameters.
However, in regression, it is often of interest to put an interval estimate onto the average response or the predicted response variable.
2. Fat & calories, a linear model
Recall that the variability of the linear model is based on both the slope and intercept changing from sample to sample.
3. Fat & calories, many linear models
Indeed, the variability of the prediction is wider at the extreme values of fat. That is, it is harder to predict the average calories of food items that have zero fat than those that have 20g of fat.
4. Predicting average calories at specific fat
The augment function (in the broom package) calculates the standard error for the predicted average at each separate explanatory variable of interest. the variability at zero and 30 grams of fat is much higher than the variability at 10 or 20 grams of fat.
As before, the CI for the average calories (as a parameter value) is the statistic plus or minus the standard error times the critical value. The critical value here is calculated for a 95% interval with degrees of freedom of the total number of observations minus two.
Notice that the CI at zero grams of fat is the same as the CI for the intercept.
The 95% confidence interval representing the average calorie content for foods with 10g of fat is 258.7 calories to 292.4 calories.
Additionally, you can see the intervals which capture the average calorie content with 95% confidence for foods that are 20g of fat and 30g of fat.
5. Creating CI for average response
In order to produce a confidence bound for the entire linear model, the interval is calculated for every observation in the dataset.
6. Plotting CI for average response
In order to produce a confidence bound for the entire linear model, the interval is calculated for every observation in the dataset. We have plotted the interval using the geom ribbon in ggplot2, but the same plot could have been created using se = TRUE in the stat smooth call.
7. Prediction intervals
Prediction intervals give a range of plausible values for the individual observations at a given level of the explanatory variable. For example, a prediction interval will tell the calorie counts for 95% of Starbucks foods with 10g of fat.
Note that the difference between prediction intervals for individual responses and confidence intervals for average responses comes in the value of the standard error.
The standard error of the individual prediction is a combination of how variable the line is and how variable the individual points are.
8. Plotting prediction intervals
We plot the prediction interval bounds using the geom_ribbon function with the new interval upper and lower limits.
9. Let's practice!
Thanks for following along with this video, now it is your turn to practice!