Session Ready
Exercise

One-Hot encoding

The problem with label encoding is that it implicitly assumes that there is a ranking dependency between the categories. So, let's change the encoding method for the features "RoofStyle" and "CentralAir" to one-hot encoding. Again, the train and test DataFrames from House Prices Kaggle competition are already available in your workspace.

Recall that if you're dealing with binary features (categorical features with only two categories) it is suggested to apply label encoder only.

Your goal is to determine which of the mentioned features is not binary, and to apply one-hot encoding only to this one.

Instructions 1/4
undefined XP
  • 1
  • 2
  • 3
  • 4
  • Determine the distribution of "RoofStyle" and "CentralAir" features using pandas' value_counts() method.