Build a random forest model for bike rentals
In this exercise, you will again build a model to predict the number of bikes rented in an hour as a function of the weather, the type of day (holiday, working day, or weekend), and the time of day. You will train the model on data from the month of July.
You will use the ranger
package to fit the random forest model. For this exercise, the key arguments to the ranger()
(docs) call are:
formula
data
num.trees
: the number of trees in the forest.respect.unordered.factors
: Specifies how to treat unordered factor variables. We recommend setting this to "order" for regression.seed
: because this is a random algorithm, you will set the seed to get reproducible results
Since there are a lot of input variables, for convenience we will specify the outcome and the inputs in the variables outcome
and vars
,
and use paste()
(docs) to assemble a string representing the model formula.
The data frame bikesJuly
has been pre-loaded. The sample code specifies the names of the outcome and input variables.
This exercise is part of the course
Supervised Learning in R: Regression
Exercise instructions
- Fill in the blanks to create the formula
fmla
expressingcnt
as a function of the inputs. Print it. - Load the package
ranger
. - Use
ranger
to fit a model to thebikesJuly
data:bike_model_rf
.- The first argument to
ranger()
is the formula,fmla
. - Use 500 trees and
respect.unordered.factors = "order"
. - Set the seed to
seed
for reproducible results. - Print the model. What is the R-squared?
- The first argument to
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# bikesJuly is available
str(bikesJuly)
# Random seed to reproduce results
seed
# The outcome column
(outcome <- "cnt")
# The input variables
(vars <- c("hr", "holiday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed"))
# Create the formula string for bikes rented as a function of the inputs
(fmla <- paste(___, "~", paste(___, collapse = " + ")))
# Load the package ranger
___
# Fit and print the random forest model
(bike_model_rf <- ranger(___, # formula
___, # data
num.trees = ___,
respect.unordered.factors = ___,
seed = ___))