Get startedGet started for free

Build a random forest model for bike rentals

In this exercise, you will again build a model to predict the number of bikes rented in an hour as a function of the weather, the type of day (holiday, working day, or weekend), and the time of day. You will train the model on data from the month of July.

You will use the ranger package to fit the random forest model. For this exercise, the key arguments to the ranger() (docs) call are:

  • formula
  • data
  • num.trees: the number of trees in the forest.
  • respect.unordered.factors : Specifies how to treat unordered factor variables. We recommend setting this to "order" for regression.
  • seed: because this is a random algorithm, you will set the seed to get reproducible results

Since there are a lot of input variables, for convenience we will specify the outcome and the inputs in the variables outcome and vars, and use paste() (docs) to assemble a string representing the model formula.

The data frame bikesJuly has been pre-loaded. The sample code specifies the names of the outcome and input variables.

This exercise is part of the course

Supervised Learning in R: Regression

View Course

Exercise instructions

  • Fill in the blanks to create the formula fmla expressing cnt as a function of the inputs. Print it.
  • Load the package ranger.
  • Use ranger to fit a model to the bikesJuly data: bike_model_rf.
    • The first argument to ranger() is the formula, fmla.
    • Use 500 trees and respect.unordered.factors = "order".
    • Set the seed to seed for reproducible results.
    • Print the model. What is the R-squared?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# bikesJuly is available
str(bikesJuly)

# Random seed to reproduce results
seed

# The outcome column
(outcome <- "cnt")

# The input variables
(vars <- c("hr", "holiday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed"))

# Create the formula string for bikes rented as a function of the inputs
(fmla <- paste(___, "~", paste(___, collapse = " + ")))

# Load the package ranger
___

# Fit and print the random forest model
(bike_model_rf <- ranger(___, # formula 
                         ___, # data
                         num.trees = ___, 
                         respect.unordered.factors = ___, 
                         seed = ___))
Edit and Run Code