1. Working with model objects
The model objects created by lm contain a lot of information. In this video, you'll see how to extract it.
2. coefficients()
Here's the model of mass versus length in the bream dataset. Printing it shows the code used to create it, and the coefficients. To use those coefficients in the rest of your analysis, you need to be able to extract them from the object.
You do this with the coefficients function. It returns a named numeric vector of coefficients. The names are the names of the coefficients.
3. fitted()
"Fitted values" is jargon for predictions on the original dataset used to create the model. Access them with the fitted function.
The result is a numeric vector of length thirty five, which is the number of rows in the bream dataset.
The fitted function is essentially a shortcut for taking the explanatory variable columns from the dataset, then feeding them to the predict function.
4. residuals()
"Residuals" are a measure of inaccuracy in the model fit, and are accessed with the residuals function. Like fitted values, there is one residual for each row of the dataset.
Each residual is the actual response value minus the predicted response value. In this case, the residuals are the masses of breams, minus the fitted values.
You'll see more on how to use the fitted values and residuals to assess the quality of your model in Chapter 3.
5. summary()
The summary function shows a more extended printout of the details of the model.
Let's step through this piece by piece.
6. summary(): call
First, you see the code you used to create the model.
7. summary(): residuals
Then you see some summary statistics of the residuals. If the model is a good fit, the residuals should follow a normal distribution. Look at the median, and see if the number is close to zero. Then look at the first and third quartiles and see if they have about the same absolute value. That is, the number labeled 1Q is about minus the number labeled 3Q.
You can get a more accurate sense of this by drawing plots, but this is a quick check.
8. summary(): coefficients
Next you see details of the coefficients. The numbers in the first column are the ones returned by the coefficients function. The numbers in the last column are the p-values, which refer to statistical significance. These are beyond the scope of this course, but you can learn about them in DataCamp's courses on inference.
9. summary(): model metrics
Finally, there are some metrics on the performance of the model. These will be discussed in the next chapter.
10. tidy()
While summary shows lots of information, it is designed to be read, not to be manipulated with code. In R, functions for programming with should return either a vector, like coefficients, fitted, and residuals did, or a data frame.
The broom package provides functions that return data frames. This makes the model results easy to manipulate with dplyr, ggplot2, and other tidyverse packages.
The tidy function returns the coefficient details in a data frame.
11. augment()
augment returns observation level results. You get one row for each row of the data frame used to create the model.
On the left, you can see the mass and length variables that we used to create the model. Dot-fitted contains the fitted values, and dot-resid contains the residuals.
12. glance()
glance returns model-level results. These are the performance metrics that you saw near the bottom of the summary output, plus a few others.
13. Let's practice!
Your turn to extract some model elements.