Get startedGet started for free

Introducing glmnet

1. Introducing glmnet

Now we'll introduce one of my favorite predictive models: the glmnet model.

2. Introducing glmnet

Glmnet models are an extension of generalized linear models (or the glm function in R). However, they have built-in variable selection that is useful on many real-world datasets. In particular, it helps linear regression models better handle collinearity--or correlation among the predictors in a model--and also helps prevent them in being over-confident in results derived from small sample sizes. There are 2 primary forms of glmnet models: lasso regression, which penalizes the number of non-zero coefficients, and ridge regression, which penalizes the absolute magnitude of the coefficients. These penalties are calculated during the model fit, and are used by the optimizer to adjust the linear regression coefficients. In other words, a glmnet attempts to find a parsimonious model, with either few non-zero coefficients, or small absolute magnitude coefficients, that best fit the input dataset. This is an extremely useful model, and pairs particularly well with random forest models, as it tends to yield different results.

3. Tuning glmnet models

glmnet models are a combination of 2 types of models: lasso regression (with a penalty on the number of non-zero coefficients) and ridge regression (with a penalty on large coefficients). Furthermore, glmnet models can fit a *mix* of lasso and ridge models, this is, a model with a small penalty on both the number of non-zero coefficients and their absolute magnitude. This gives glmnet models many parameters to tune. The alpha parameter ranges from 0 to 1, where 0 is pure ridge regression, 1 is pure lasso regression, and any value between is a mix of the two. Lambda, on the other hand, ranges from 0 to positive infinity, and controls the size of the penalty. Higher values of lambda will yield simpler models, and high enough values of lambda will yield intercept-only models that just predict the mean of the response variable in the training data.

4. Example: "don't overfit"

Let's take a look at the "don't overfit" dataset, which is based on the first Kaggle competition I ever competed in. This dataset has almost as many columns as rows, which makes it challenging for traditional linear regression models. We'll make a custom trainControl object that predicts class probabilities and uses AUC to perform grid search and select models.

5. Try the defaults

We'll start with a simple model that uses the default caret tuning grid: 3 values of alpha and 3 values of lambda, and plot the result.

6. Plot the results

As you can see, the model with an alpha (or mixing percentage) of around 0-point-55 and a medium value of lambda (or regularization parameter) does the best on this dataset.

7. Let’s practice!

Let's explore the glmnet model in some more detail.