Encoding categorical variables
There are couple of columns in the UFO dataset that need to be encoded before they can be modeled through scikit-learn. You'll do that transformation here, using both binary and one-hot encoding methods.
Cet exercice fait partie du cours
Preprocessing for Machine Learning in Python
Instructions
- Using
apply()
, write a conditionallambda
function that returns a1
if the value is"us"
, else return 0. - Print out the number of
.unique()
values in thetype
column. - Using
pd.get_dummies()
, create a one-hot encoded set of thetype
column. - Finally, use
pd.concat()
to concatenate thetype_set
encoded variables to theufo
dataset.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Use pandas to encode us values as 1 and others as 0
ufo["country_enc"] = ufo["country"].____
# Print the number of unique type values
print(len(____.unique()))
# Create a one-hot encoded set of the type values
type_set = ____
# Concatenate this set back to the ufo DataFrame
ufo = pd.concat([____, ____], axis=1)