Fit a model to predict bike rental counts
In this exercise, you will build a model to predict the number of bikes rented in an hour as a function of the weather, the type of day (holiday, working day, or weekend), and the time of day. You will train the model on data from the month of July.
The data frame has the columns:
- cnt: the number of bikes rented in that hour (the outcome)
- hr: the hour of the day (0-23, as a factor)
- holiday: TRUE/FALSE
- workingday: TRUE if neither a holiday nor a weekend, else FALSE
- weathersit: categorical, "Clear to partly cloudy"/"Light Precipitation"/"Misty"
- temp: normalized temperature in Celsius
- atemp: normalized "feeling" temperature in Celsius
- hum: normalized humidity
- windspeed: normalized windspeed
- instant: the time index -- number of hours since beginning of dataset (not a variable)
- mnthand- yr: month and year indices (not variables)
Remember that you must specify family = poisson or family = quasipoisson when using glm() (docs) to fit a count model.
Since there are a lot of input variables, for convenience we will specify the outcome and the inputs in variables,
and use paste() (docs) to assemble a string representing the model formula.
The bikesJuly data frame is available to use. The names of the outcome variable and the input variables have also been loaded as the variables outcome and vars, respectively.
Este exercício faz parte do curso
Supervised Learning in R: Regression
Instruções do exercício
- Fill in the blanks to create the formula fmlaexpressingcntas a function of the inputs. Print it.
- Calculate the mean (mean()) and variance (var()) ofbikesJuly$cnt.- Should you use poisson or quasipoisson regression?
 
- Use glm()to fit a model to thebikesJulydata:bike_model.
- Use glance()to look at the model's fit statistics. Assign the output ofglance()to the variableperf.
- Calculate the pseudo-R-squared of the model.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# bikesJuly is available
str(bikesJuly)
# The outcome column
outcome 
# The inputs to use
vars 
# Create the formula string for bikes rented as a function of the inputs
(fmla <- paste(___, "~", paste(___, collapse = " + ")))
# Calculate the mean and variance of the outcome
(mean_bikes <- ___)
(var_bikes <- ___)
# Fit the model
bike_model <- ___
# Call glance
(perf <- ___)
# Calculate pseudo-R-squared
(pseudoR2 <- ___)