MulaiMulai sekarang secara gratis

Novel levels

When a level of a categorical variable is rare, sometimes it will fail to show up in training data. If that rare level then appears in future data, downstream models may not know what to do with it. When such novel levels appear, using model.matrix or caret::dummyVars to one-hot-encode will not work correctly.

vtreat is a "safer" alternative to model.matrix for one-hot-encoding, because it can manage novel levels safely. vtreat also manages missing values in the data (both categorical and continuous).

In this exercise, you will see how vtreat handles categorical values that did not appear in the training set. The treatment plan treatplan and the set of variables newvars from the previous exercise are still available. dframe and a new data frame testframe have been pre-loaded.

Latihan ini adalah bagian dari kursus

Supervised Learning in R: Regression

Lihat Kursus

Petunjuk latihan

  • Print dframe and testframe.
    • Are there colors in testframe that didn't appear in dframe?
  • Call prepare() to create a one-hot-encoded version of testframe (without the outcome). Call it testframe.treat and print it.
    • Use the varRestriction argument to restrict to only the variables in newvars.
    • How are the yellow rows encoded?

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# treatplan is available
summary(treatplan)

# newvars is available
newvars

# Print dframe and testframe
___
___

# Use prepare() to one-hot-encode testframe
(testframe.treat <- ___(___, ___, varRestriction = ___))
Edit dan Jalankan Kode