Get startedGet started for free

Fit a model to predict bike rental counts

In this exercise, you will build a model to predict the number of bikes rented in an hour as a function of the weather, the type of day (holiday, working day, or weekend), and the time of day. You will train the model on data from the month of July.

The data frame has the columns:

  • cnt: the number of bikes rented in that hour (the outcome)
  • hr: the hour of the day (0-23, as a factor)
  • holiday: TRUE/FALSE
  • workingday: TRUE if neither a holiday nor a weekend, else FALSE
  • weathersit: categorical, "Clear to partly cloudy"/"Light Precipitation"/"Misty"
  • temp: normalized temperature in Celsius
  • atemp: normalized "feeling" temperature in Celsius
  • hum: normalized humidity
  • windspeed: normalized windspeed
  • instant: the time index -- number of hours since beginning of dataset (not a variable)
  • mnth and yr: month and year indices (not variables)

Remember that you must specify family = poisson or family = quasipoisson when using glm() (docs) to fit a count model.

Since there are a lot of input variables, for convenience we will specify the outcome and the inputs in variables, and use paste() (docs) to assemble a string representing the model formula.

The bikesJuly data frame is available to use. The names of the outcome variable and the input variables have also been loaded as the variables outcome and vars, respectively.

This exercise is part of the course

Supervised Learning in R: Regression

View Course

Exercise instructions

  • Fill in the blanks to create the formula fmla expressing cnt as a function of the inputs. Print it.
  • Calculate the mean (mean()) and variance (var()) of bikesJuly$cnt.
    • Should you use poisson or quasipoisson regression?
  • Use glm() to fit a model to the bikesJuly data: bike_model.
  • Use glance() to look at the model's fit statistics. Assign the output of glance() to the variable perf.
  • Calculate the pseudo-R-squared of the model.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# bikesJuly is available
str(bikesJuly)

# The outcome column
outcome 

# The inputs to use
vars 

# Create the formula string for bikes rented as a function of the inputs
(fmla <- paste(___, "~", paste(___, collapse = " + ")))

# Calculate the mean and variance of the outcome
(mean_bikes <- ___)
(var_bikes <- ___)

# Fit the model
bike_model <- ___

# Call glance
(perf <- ___)

# Calculate pseudo-R-squared
(pseudoR2 <- ___)
Edit and Run Code