Encoding categorical variables
There are couple of columns in the UFO dataset that need to be encoded before they can be modeled through scikit-learn. You'll do that transformation here, using both binary and one-hot encoding methods.
This exercise is part of the course
Preprocessing for Machine Learning in Python
Exercise instructions
- Using
apply(), write a conditionallambdafunction that returns a1if the value is"us", else return 0. - Print out the number of
.unique()values in thetypecolumn. - Using
pd.get_dummies(), create a one-hot encoded set of thetypecolumn. - Finally, use
pd.concat()to concatenate thetype_setencoded variables to theufodataset.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use pandas to encode us values as 1 and others as 0
ufo["country_enc"] = ufo["country"].____
# Print the number of unique type values
print(len(____.unique()))
# Create a one-hot encoded set of the type values
type_set = ____
# Concatenate this set back to the ufo DataFrame
ufo = pd.concat([____, ____], axis=1)