vtreat the bike rental data
In this exercise, you will create one-hot-encoded data frames of the July/August bike data, for use with xgboost later on.
The data frames bikesJuly and bikesAugust have been pre-loaded.
For your convenience, we have defined the variable vars with the list of variable columns for the model.
Este ejercicio forma parte del curso
Supervised Learning in R: Regression
Instrucciones del ejercicio
- Load the package
vtreat. - Use
designTreatmentsZ()to create a treatment plantreatplanfor the variables invarsfrombikesJuly(the training data).- Set the flag
verbose=FALSEto prevent the function from printing too many messages.
- Set the flag
- Fill in the blanks to create a vector
newvarsthat contains only the names of thecleanandlevtransformed variables. Print it. - Use
prepare()to create a one-hot-encoded training data framebikesJuly.treat.- Use the
varRestrictionsargument to restrict the variables you will use tonewvars.
- Use the
- Use
prepare()to create a one-hot-encoded test framebikesAugust.treatfrombikesAugustin the same way. - Call
str()on both prepared test frames to see the structure.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# The outcome column
(outcome <- "cnt")
# The input columns
(vars <- c("hr", "holiday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed"))
# Load the package vtreat
___
# Create the treatment plan from bikesJuly (the training data)
treatplan <- ___(___, ___, verbose = FALSE)
# Get the "clean" and "lev" variables from the scoreFrame
(newvars <- treatplan %>%
use_series(scoreFrame) %>%
filter(code %in% ___) %>% # get the rows you care about
use_series(___)) # get the varName column
# Prepare the training data
bikesJuly.treat <- ___(___, ___, varRestriction = ___)
# Prepare the test data
bikesAugust.treat <- ___(___, ___, varRestriction = ___)
# Call str() on the treated data
___
___