Encoding categorical variables
There are couple of columns in the UFO dataset that need to be encoded before they can be modeled through scikit-learn. You'll do that transformation here, using both binary and one-hot encoding methods.
This exercise is part of the course
Preprocessing for Machine Learning in Python
Exercise instructions
- Using
apply()
, write a conditionallambda
function that returns a1
if the value is"us"
, else return 0. - Print out the number of
.unique()
values in thetype
column. - Using
pd.get_dummies()
, create a one-hot encoded set of thetype
column. - Finally, use
pd.concat()
to concatenate thetype_set
encoded variables to theufo
dataset.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use pandas to encode us values as 1 and others as 0
ufo["country_enc"] = ufo["country"].____
# Print the number of unique type values
print(len(____.unique()))
# Create a one-hot encoded set of the type values
type_set = ____
# Concatenate this set back to the ufo DataFrame
ufo = pd.concat([____, ____], axis=1)