1. Automating the modeling workflow
In this section, we will focus on applying the last fit workflow that we learned in the regression section to streamline our classification modeling.
2. Streamlining the workflow
The last_fit() function also accepts classification models.
This function speeds up the modeling workflow by fitting models to the training data as well as generating predictions on the test data.
Before using the last_fit() function, we must create a data split object with rsample and specify our model with parsnip.
3. Fitting the model and collecting metrics
To train our logistic regression model with last_fit(), we pass our logistic_regression model object to the last_fit function and add our model formula followed by our data split object, leads_split.
Once the model is trained, we can use the collect_metrics() function to calculate performance metrics on the test dataset.
The default metrics are accuracy and ROC AUC. Notice that we get the same performance metrics as before, just with a lot less effort!
4. Collecting predictions
Passing a trained last fit model object into the collect_predictions() function will create a tibble of model results on the test dataset.
The results tibble will contain all the required columns for calculating performance metrics with yardstick functions.
The important columns for our logistic regression model on the leads_df data are dot-pred_yes, dot-pred_no, dot-pred_class, and purchased.
5. Custom metric sets
A special adjustment must be made when creating custom metric functions using metric_set() that include the roc_auc() function from yardstick.
If we would like to create a metric set with accuracy, sensitivity, specificity, and ROC AUC, then we must remember that the accuracy(), sens(), and spec() functions take slightly different arguments than the roc_auc() function. The accuracy(), sens(), and spec() functions require a truth and estimate column, while the roc_auc() function requires a truth column and a column which has the estimated probabilities for the positive class.
For our last_fit_results tibble, the truth column is purchased, the estimate column is dot-pred_class, and the estimated probabilities for the positive class is the dot-pred_yes column.
All three of these must be passed to our custom metrics function, with dot-pred_yes as the last argument, in order for it to work properly.
6. Let's practice!
Let's practice fitting logistic regression models with the last fit workflow!