vtreat the bike rental data
In this exercise, you will create one-hot-encoded data frames of the July/August bike data, for use with xgboost
later on.
The data frames bikesJuly
and bikesAugust
have been pre-loaded.
For your convenience, we have defined the variable vars
with the list of variable columns for the model.
This exercise is part of the course
Supervised Learning in R: Regression
Exercise instructions
- Load the package
vtreat
. - Use
designTreatmentsZ()
to create a treatment plantreatplan
for the variables invars
frombikesJuly
(the training data).- Set the flag
verbose=FALSE
to prevent the function from printing too many messages.
- Set the flag
- Fill in the blanks to create a vector
newvars
that contains only the names of theclean
andlev
transformed variables. Print it. - Use
prepare()
to create a one-hot-encoded training data framebikesJuly.treat
.- Use the
varRestrictions
argument to restrict the variables you will use tonewvars
.
- Use the
- Use
prepare()
to create a one-hot-encoded test framebikesAugust.treat
frombikesAugust
in the same way. - Call
str()
on both prepared test frames to see the structure.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# The outcome column
(outcome <- "cnt")
# The input columns
(vars <- c("hr", "holiday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed"))
# Load the package vtreat
___
# Create the treatment plan from bikesJuly (the training data)
treatplan <- ___(___, ___, verbose = FALSE)
# Get the "clean" and "lev" variables from the scoreFrame
(newvars <- treatplan %>%
use_series(scoreFrame) %>%
filter(code %in% ___) %>% # get the rows you care about
use_series(___)) # get the varName column
# Prepare the training data
bikesJuly.treat <- ___(___, ___, varRestriction = ___)
# Prepare the test data
bikesAugust.treat <- ___(___, ___, varRestriction = ___)
# Call str() on the treated data
___
___