Fit a model to predict bike rental counts
In this exercise, you will build a model to predict the number of bikes rented in an hour as a function of the weather, the type of day (holiday, working day, or weekend), and the time of day. You will train the model on data from the month of July.
The data frame has the columns:
cnt
: the number of bikes rented in that hour (the outcome)hr
: the hour of the day (0-23, as a factor)holiday
: TRUE/FALSEworkingday
: TRUE if neither a holiday nor a weekend, else FALSEweathersit
: categorical, "Clear to partly cloudy"/"Light Precipitation"/"Misty"temp
: normalized temperature in Celsiusatemp
: normalized "feeling" temperature in Celsiushum
: normalized humiditywindspeed
: normalized windspeedinstant
: the time index -- number of hours since beginning of dataset (not a variable)mnth
andyr
: month and year indices (not variables)
Remember that you must specify family = poisson
or family = quasipoisson
when using glm()
(docs) to fit a count model.
Since there are a lot of input variables, for convenience we will specify the outcome and the inputs in variables,
and use paste()
(docs) to assemble a string representing the model formula.
The bikesJuly
data frame is available to use. The names of the outcome variable and the input variables have also been loaded as the variables outcome
and vars
, respectively.
This exercise is part of the course
Supervised Learning in R: Regression
Exercise instructions
- Fill in the blanks to create the formula
fmla
expressingcnt
as a function of the inputs. Print it. - Calculate the mean (
mean()
) and variance (var()
) ofbikesJuly$cnt
.- Should you use poisson or quasipoisson regression?
- Use
glm()
to fit a model to thebikesJuly
data:bike_model
. - Use
glance()
to look at the model's fit statistics. Assign the output ofglance()
to the variableperf
. - Calculate the pseudo-R-squared of the model.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# bikesJuly is available
str(bikesJuly)
# The outcome column
outcome
# The inputs to use
vars
# Create the formula string for bikes rented as a function of the inputs
(fmla <- paste(___, "~", paste(___, collapse = " + ")))
# Calculate the mean and variance of the outcome
(mean_bikes <- ___)
(var_bikes <- ___)
# Fit the model
bike_model <- ___
# Call glance
(perf <- ___)
# Calculate pseudo-R-squared
(pseudoR2 <- ___)