1. Tidy your models with broom
Now that you know how to work with list columns in a tidy manner you can begin to work with the tools you need to explore and evaluate machine learning models.
2. List Column Workflow
As you can probably imagine, the bulk of the work of machine learning resides in step two of this workflow. Since you can store complex model objects in your data frame you can also work with these objects using the tools available in various R packages.
3. List Column Workflow
In this video, we will focus on the broom package. A package designed to convert useful model outputs into tidy data frames.
4. Broom Toolkit
The core of broom is encapsulated by three functions which aim to extract conceptually different information from any model.
- tidy() is used to extract the statistical findings of a model.
- glance() provides a one row summary of a model, and
- augmment() appends the predicted values of a model to the data being modeled.
Let's explore each of these in greater detail by reviewing the results of the linear model that you created for Algeria.
5. Summary of algeria_model
If you look at the summary() of the Algeria model you can see that there is a lot of useful information here.
However, this information is not particularly easy to extract directly from the object as it is to simply print it.
But using tidy() and glance() you can easily extract this information into data frames.
6. tidy()
The tidy() function collects the statistical findings of a model into a data frame.
When used with a linear model, tidy() returns the coefficients and their corresponding statistics for that model.
7. tidy()
To extract these statistics you simply apply the tidy() function to the model object as shown here.
8. glance()
The next broom function, glance(), is used to return a one row summary of a model.
For a linear model, this summary contains various statistics about the fit of the model such as the r squared.
9. glance()
Extracting this information into a data frame is as simple as calling the function on the model object.
10. augment()
Finally, the augment() function builds an observation-level data frame containing the original data used to build the model as well as the predicted value for each observation as the column dot fitted. Furthermore, augment() appends model-specific statistics of fit for each observation.
By constructing a data frame containing both the original values and those predicted by our model you can explore the fit of the model.
11. Plotting Augmented Data
For instance, you can visualize how well your model fits the data by plotting the predicted and actual values of life expectancy with respect to year.
In this plot the actual values are the black points and the fit of the model, or predicted values, is shown as the red line.
By examining this plot you can learn that a simple linear model may not be the best approach for this example and would consider either including more features or using a non-linear approach to better capture this relationship.
12. Let's use broom!
Using these three tools makes it easy to extract model coefficients, fit statistics and observation-level performance for many different machine learning models.
In chapter two we will use broom as a part of the list column workflow to do this for all 77 of our country-level models with just a few lines of code. But first, let's review what you have learned with a few exercises.