Exercise

# Fit an xgboost bike rental model and predict

In this exercise you will fit a gradient boosting model using `xgboost()`

to predict the number of bikes rented in an hour as a function of the weather and the type and time of day. You will train the model on data from the month of July and predict on data for the month of August.

The datasets for July and August are loaded into your workspace. Remember the `vtreat`

-ed data no longer has the outcome column, so you must get it from the original data (the `cnt`

column).

For convenience, the number of trees to use, `ntrees`

from the previous exercise is in the workspace.

The arguments to `xgboost()`

are similar to those of `xgb.cv()`

.

Instructions

**100 XP**

The data frames `bikesJuly`

, `bikesJuly.treat`

, `bikesAugust`

and `bikesAugust.treat`

are in the workspace. The number of trees `ntrees`

is in the workspace.

- Fill in the blanks to run
`xgboost()`

on the July data. Assign the model to the variable`model`

.- Use
`as.matrix()`

to convert the vtreated data frame to a matrix. - The objective should be "reg:linear".
- Use
`ntrees`

rounds. - Set
`eta`

to 0.3,`depth`

to 6, and`verbose`

to 0 (silent).

- Use
- Now call
`predict()`

on`bikesAugust.treat`

to predict the number of bikes rented in August.- Use
`as.matrix()`

to convert the`vtreat`

-ed test data into a matrix. - Add the predictions to
`bikesAugust`

as the column`pred`

.

- Use
- Fill in the blanks to plot actual bike rental counts versus the predictions (predictions on the x-axis).
- Do you see a possible problem with the predictions?