1. Intervals in regression
Hypothesis testing of slopes is only one part of our inferential tool box. Often we are more interested in quantifying the slope in the population than we are in knowing if it is different from zero.
2. Starbucks: fat & calories
For the Starbucks data, it seems obvious that there is a linear model. But what is less obvious is the value for the true slope of the linear model that describes the population. We call the true population slope the parameter, and as part of estimating the parameter, we want to gauge our confidence in how closely the sample slope estimates the true population slope.
3. Fat & calories, a linear model
Here, the hypothesis could be stated as a question on whether the linear model between fat and calories has a slope of zero. A slope of zero would indicate that regardless of the amount of fat in a particular food, the linear model predicts the same number of calories.
But what if you are interested in estimating the number of extra calories for foods with additional grams of fat? Instead of running a hypothesis test on the slope, your goal would be to find a confidence interval for the slope.
4. Confidence interval for slope and intercept parameters
As you've seen in previous chapters, a confidence interval for a given parameter can be created using the estimate of the parameter as well as the standard error of the estimate around the parameter. The estimate and the SE are both given in the tidy evaluation of the linear model.
5. Confidence interval for intercept parameter
the confidence interval for the intercept parameter goes from 118.3kcal to 177.65 kcal. In this particular case, we can interpret the intercept in terms of the problem, because zero fat is a plausible value for the explanatory variable. Oftentimes, a zero value represents extrapolation outside of the plausible range, so it is important to be careful with the interpretation of the intercept.
Here we say that for the set of food items with zero fat, we are 95% confident that their average calories will be between 118.3 and 177.7.
6. Confidence interval for slope parameter
The confidence interval for the slope parameter is 11.1 to 14.4. that means, for each set of foods with one additional gram of fat, we expect the average calorie content to be an additional 11.1 to 14.4 kcal.
note that a regression model cannot tell you anything about causation. that is, we can't say that one additional gram of fat *adds* between 11.1 to 14.4 calories to a food item. Of course, in this case, nutritional science can tell us how many calories a gram of fat has, but that doesn't mean that an additional gram of fat will add exactly that amount to a specific food item because the make-up of the food item includes many other components as well.
the correct interpretation of the confidence interval for the slope parameter is to indicate that you are 95% confident that foods with one higher gram of fat will have, on average, between 11.1 and 14.4 additional calories.
7. Bootstrap interval for slope
In addition to using the mathematical theory to create a confidence interval for the slope, you can also use bootstrap sampling to estimate the variability of the slope associated with re-samples.
In the example above, you can see that the bootstrap confidence interval ranges from 11.2 to 14.3 (very close to the mathematical approximation which went from 11.1 to 14.4).
8. Let's practice!
Thanks for following along with this video, now it is your turn to practice!