One-Hot encoding
The problem with label encoding is that it implicitly assumes that there is a ranking dependency between the categories. So, let's change the encoding method for the features "RoofStyle" and "CentralAir" to one-hot encoding. Again, the train
and test
DataFrames from House Prices Kaggle competition are already available in your workspace.
Recall that if you're dealing with binary features (categorical features with only two categories) it is suggested to apply label encoder only.
Your goal is to determine which of the mentioned features is not binary, and to apply one-hot encoding only to this one.
Diese Übung ist Teil des Kurses
Winning a Kaggle Competition in Python
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Concatenate train and test together
houses = pd.concat([train, test])
# Look at feature distributions
print(houses['RoofStyle'].____, '\n')
print(houses['CentralAir'].____)