Session Ready
Exercise

Fit an xgboost bike rental model and predict

In this exercise you will fit a gradient boosting model using xgboost() to predict the number of bikes rented in an hour as a function of the weather and the type and time of day. You will train the model on data from the month of July and predict on data for the month of August.

The datasets for July and August are loaded into your workspace. Remember the vtreat-ed data no longer has the outcome column, so you must get it from the original data (the cnt column).

For convenience, the number of trees to use, ntrees from the previous exercise is in the workspace.

The arguments to xgboost() are similar to those of xgb.cv().

Instructions
100 XP

The data frames bikesJuly, bikesJuly.treat, bikesAugust and bikesAugust.treat are in the workspace. The number of trees ntrees is in the workspace.

  • Fill in the blanks to run xgboost() on the July data. Assign the model to the variable model.
    • Use as.matrix() to convert the vtreated data frame to a matrix.
    • The objective should be "reg:linear".
    • Use ntrees rounds.
    • Set eta to 0.3, depth to 6, and verbose to 0 (silent).
  • Now call predict() on bikesAugust.treat to predict the number of bikes rented in August.
    • Use as.matrix() to convert the vtreat-ed test data into a matrix.
    • Add the predictions tobikesAugust as the column pred.
  • Fill in the blanks to plot actual bike rental counts versus the predictions (predictions on the x-axis).
    • Do you see a possible problem with the predictions?