1. Reintroducing glmnet
Recall the glmnet model we learned about earlier.
2. glmnet review
It is a linear regression model with built-in variable selection and is a great baseline model for any predictive modeling problem. It is almost always the first model I try on new datasets.
It is a useful baseline, because it is fast, uses variable selection to ignore noisy variables, and also provides linear regression coefficients you can use to understand patterns in your data. It yields models that are just as interpretable as models from the lm or glm functions in R.
A business analyst could use these coefficients to understand key drivers of churn, but even if you only care about predictions, glmnet is a solid baseline that fits quickly and often provided very accurate models.
3. Example: glmnet on churn data
Glmnet models are simple, fast, and interpretable. Let's fit one to the churn dataset.
After fitting the model, we can plot the results, and look at the relationship between alpha and lambda and the AUC of the model.
4. Visualize results
In this case, it looks like an alpha of 1 yields the best results on the churn dataset. Caret automatically chooses the best values for alpha and lambda, so we don't need to do anything after looking at this plot, but its useful to understand how our models works.
5. Plot the coefficients
We can also plot the glmnet coefficients, and see how our best model evolves as we increase or decrease the penalty on the coefficients.
6. Let’s practice!