Get Started

Making predictions

1. Making predictions

GAMs are great for understanding complex systems, and also for making predictions. Here we'll learn to make predictions from the models we fit, and to explain how the model creates them.

2. mgcv's predict() function

As with most models in R, you can make predictions from a GAM object with the predict() function. Simply running predict() on a model, in this case our logistic model of purchasing behavior, will yield a vector of predictions for each data point in the data set we used to fit the model.

3. Prediction types

By default, the predict() function returns values on the "link" scale. That is, the scale on which the model was fit to the data. For a logistic model, this is the log-odds scale. We can have predict() return results on the probability scale by using the argument type = "response". This is the equivalent of running the plogis() logistic function on our predictions.

4. Standard errors

If we set the argument se.fit to TRUE in our call, predict() returns a list where the first element, fit, contains our vector of predictions, and the second element, named se.fit, contains standard errors for our predictions.

5. Standard errors (2)

Standard errors are only approximations when we use the probability scale. This is because errors are non-symmetrical on this scale. If you use standard errors to construct confidence intervals for your predictions, you should do so on the log-odds scale, and then convert them to probability using the plogis() logistic function. Here, on the left, we see what happens when we make intervals by adding or subtracting on the response scale. Predictions of probability can be below zero or above one, which doesn't make any sense. On the right, we show the result of doing this correctly. When we transform from log-odds to probability after adding the errors, our predictions are well bounded.

6. Predictions on new data

Of course, we are often interested on model predictions beyond the data we use to fit the model, that is the new, out-of sample data. The newdata argument lets us pass new data to our model and generate predictions, so we can apply our model to new situations. This allows us to predict on test data after fitting our model on training data.

7. Explaining predictions by terms

In multiple regression, it is often useful to understand how each term contributes to an individual prediction. We can examine this by setting the type argument to "terms" in the predict() function. This will produce a matrix showing the contribution of each smooth to each prediction. If we were to sum across all the columns of this matrix, and add the intercept, we would have our overall prediction the log-odds scale.

8. Explaining predictions by terms (2)

Here we look at the first row of this output to see the role of each term in influencing our prediction probability. This allows us to explain model predictions. For instance, as we can see for this one data point, the number of accounts has about four times the effect in increasing purchase probability prediction than balance-credit ratio. Mortgage age and credit limit influence the prediction in the opposite direction, about the same amount as balance-credit ratio. If we add these terms up, add the intercept, and transform using the plogis() function, we get this data point's predicted purchase probability.

9. Let's practice!

Now let's make and interpret some predictions.