Exercise

# Prediction using a GBM model

The **gbm** package uses a `predict()`

function to generate predictions from a model, similar to many other machine learning packages in R. When you see a function like `predict()`

that works on many different types of input (a GBM model, a RF model, a GLM model, etc), that indicates that `predict()`

is an "alias" for a GBM-specific version of that function. The GBM specific version of that function is `predict.gbm()`

, but for convenience sake, we can just use `predict()`

(either works).

One thing that's particular to the `predict.gbm()`

however, is that you need to specify the number of trees used in the prediction. There is no default, so you have to specify this manually. For now, we can use the same number of trees that we specified when training the model, which is 10,000 (though this may not be the optimal number to use).

Another argument that you can specify is `type`

, which is only relevant to Bernoulli and Poisson distributed outcomes. When using Bernoulli loss, the returned value is on the log odds scale by default and for Poisson, it's on the log scale. If instead you specify `type = "response"`

, then `gbm`

converts the predicted values back to the same scale as the outcome. This will convert the predicted values into probabilities for Bernoulli and expected counts for Poisson.

Instructions

**100 XP**

- Generate predictions on the test set, using 10,000 trees.
- Generate predictions on the test set using
`type = "response"`

and 10,000 trees. - Compare the ranges of the two sets of predictions.