One-Hot encoding
The problem with label encoding is that it implicitly assumes that there is a ranking dependency between the categories. So, let's change the encoding method for the features "RoofStyle" and "CentralAir" to one-hot encoding. Again, the train
and test
DataFrames from House Prices Kaggle competition are already available in your workspace.
Recall that if you're dealing with binary features (categorical features with only two categories) it is suggested to apply label encoder only.
Your goal is to determine which of the mentioned features is not binary, and to apply one-hot encoding only to this one.
This exercise is part of the course
Winning a Kaggle Competition in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Concatenate train and test together
houses = pd.concat([train, test])
# Look at feature distributions
print(houses['RoofStyle'].____, '\n')
print(houses['CentralAir'].____)