Session Ready
Exercise

Fit a model to predict bike rental counts

In this exercise you will build a model to predict the number of bikes rented in an hour as a function of the weather, the type of day (holiday, working day, or weekend), and the time of day. You will train the model on data from the month of July.

The data frame has the columns:

  • cnt: the number of bikes rented in that hour (the outcome)
  • hr: the hour of the day (0-23, as a factor)
  • holiday: TRUE/FALSE
  • workingday: TRUE if neither a holiday nor a weekend, else FALSE
  • weathersit: categorical, "Clear to partly cloudy"/"Light Precipitation"/"Misty"
  • temp: normalized temperature in Celsius
  • atemp: normalized "feeling" temperature in Celsius
  • hum: normalized humidity
  • windspeed: normalized windspeed
  • instant: the time index -- number of hours since beginning of data set (not a variable)
  • mnth and yr: month and year indices (not variables)

Remember that you must specify family = poisson or family = quasipoisson when using glm() to fit a count model.

Since there are a lot of input variables, for convenience we will specify the outcome and the inputs in variables, and use paste() to assemble a string representing the model formula.

Instructions
100 XP

The data frame bikesJuly is in the workspace. The names of the outcome variable and the input variables are also in the workspace as the variables outcome and vars respectively.

  • Fill in the blanks to create the formula fmla expressing cnt as a function of the inputs. Print it.
  • Calculate the mean (mean()) and variance (var()) of bikesJuly$cnt.
    • Should you use poisson or quasipoisson regression?
  • Use glm() to fit a model to the bikesJuly data: bike_model.
  • Use glance() to look at the model's fit statistics. Assign the output of glance() to the variable perf.
  • Calculate the pseudo-R-squared of the model.