Fit an xgboost bike rental model and predict
In this exercise, you will fit a gradient boosting model using xgboost()
to predict the number of bikes rented in an hour as a function of the weather and the type and time of day. You will train the model on data from the month of July and predict on data for the month of August.
The data frames bikesJuly
, bikesJuly.treat
, bikesAugust
, and bikesAugust.treat
have also been pre-loaded. Remember the vtreat
-ed data no longer has the outcome column, so you must get it from the original data (the cnt
column).
For convenience, the number of trees to use, ntrees
from the previous exercise is available to use.
The arguments to xgboost()
(docs) are similar to those of xgb.cv()
.
This exercise is part of the course
Supervised Learning in R: Regression
Exercise instructions
- Fill in the blanks to run
xgboost()
on the July data.- Use
as.matrix()
to convert the vtreated data frame to a matrix. - The objective should be
"reg:squarederror"
. - Use
ntrees
rounds. - Set
eta
to0.75
,max_depth
to5
, andverbose
toFALSE
(silent).
- Use
- Now call
predict()
onbikesAugust.treat
to predict the number of bikes rented in August.- Use
as.matrix()
to convert thevtreat
-ed test data into a matrix. - Add the predictions to
bikesAugust
as the columnpred
.
- Use
- Fill in the blanks to plot actual bike rental counts versus the predictions (predictions on the x-axis).
- Do you see a possible problem with the predictions?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Run xgboost
bike_model_xgb <- xgboost(data = ___, # training data as matrix
label = ___, # column of outcomes
nrounds = ___, # number of trees to build
objective = ___, # objective
eta = ___,
max_depth = ___,
verbose = FALSE # silent
)
# Make predictions
bikesAugust$pred <- ___(___, ___(___))
# Plot predictions (on x axis) vs actual bike rental count
ggplot(bikesAugust, aes(x = ___, y = ___)) +
geom_point() +
geom_abline()