Session Ready
Exercise

Removing near-zero-variance features

An easy unsupervised manner of removing irrelevant features is checking which ones have zero or near-zero variance.

Zero-variance features are those that only have a unique value, hence they do not carry any meaningful information. Furthermore, they might cause the model to crash or become unstable.

Near-zero-variance features are those having a few unique values that occur very rarely. These features could mislead the model training or even become zero-variance when splitting the data into multiple subsets for validation purposes.

Fortunately, caret has the nearZeroVar() function, which makes this task quite easy. Try it out on a modified version of the Google Apps dataset named apps in your workspace. The dplyr package has also been loaded.

Instructions 1/4
undefined XP
  • 1
  • 2
  • 3
  • 4
  • Glimpse at the apps dataframe.