Exercise

vtreat the bike rental data

In this exercise, you will create one-hot-encoded data frames of the July/August bike data, for use with xgboost later on.

The data frames bikesJuly and bikesAugust have been pre-loaded.

For your convenience, we have defined the variable vars with the list of variable columns for the model.

Instructions

100 XP
  • Load the package vtreat.
  • Use designTreatmentsZ() to create a treatment plan treatplan for the variables in vars from bikesJuly (the training data).
    • Set the flag verbose=FALSE to prevent the function from printing too many messages.
  • Fill in the blanks to create a vector newvars that contains only the names of the clean and lev transformed variables. Print it.
  • Use prepare() to create a one-hot-encoded training data frame bikesJuly.treat.
    • Use the varRestrictions argument to restrict the variables you will use to newvars.
  • Use prepare() to create a one-hot-encoded test frame bikesAugust.treat from bikesAugust in the same way.
  • Call str() on both prepared test frames to see the structure.