Get startedGet started for free

vtreat the bike rental data

In this exercise, you will create one-hot-encoded data frames of the July/August bike data, for use with xgboost later on.

The data frames bikesJuly and bikesAugust have been pre-loaded.

For your convenience, we have defined the variable vars with the list of variable columns for the model.

This exercise is part of the course

Supervised Learning in R: Regression

View Course

Exercise instructions

  • Load the package vtreat.
  • Use designTreatmentsZ() to create a treatment plan treatplan for the variables in vars from bikesJuly (the training data).
    • Set the flag verbose=FALSE to prevent the function from printing too many messages.
  • Fill in the blanks to create a vector newvars that contains only the names of the clean and lev transformed variables. Print it.
  • Use prepare() to create a one-hot-encoded training data frame bikesJuly.treat.
    • Use the varRestrictions argument to restrict the variables you will use to newvars.
  • Use prepare() to create a one-hot-encoded test frame bikesAugust.treat from bikesAugust in the same way.
  • Call str() on both prepared test frames to see the structure.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# The outcome column
(outcome <- "cnt")

# The input columns
(vars <- c("hr", "holiday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed"))

# Load the package vtreat
___

# Create the treatment plan from bikesJuly (the training data)
treatplan <- ___(___, ___, verbose = FALSE)

# Get the "clean" and "lev" variables from the scoreFrame
(newvars <- treatplan %>%
  use_series(scoreFrame) %>%        
  filter(code %in% ___) %>%  # get the rows you care about
  use_series(___))           # get the varName column

# Prepare the training data
bikesJuly.treat <- ___(___, ___,  varRestriction = ___)

# Prepare the test data
bikesAugust.treat <- ___(___, ___,  varRestriction = ___)

# Call str() on the treated data
___
___
Edit and Run Code