CommencerCommencer gratuitement

Encoding categorical variables

There are couple of columns in the UFO dataset that need to be encoded before they can be modeled through scikit-learn. You'll do that transformation here, using both binary and one-hot encoding methods.

Cet exercice fait partie du cours

Preprocessing for Machine Learning in Python

Afficher le cours

Instructions

  • Using apply(), write a conditional lambda function that returns a 1 if the value is "us", else return 0.
  • Print out the number of .unique() values in the type column.
  • Using pd.get_dummies(), create a one-hot encoded set of the type column.
  • Finally, use pd.concat() to concatenate the type_set encoded variables to the ufo dataset.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Use pandas to encode us values as 1 and others as 0
ufo["country_enc"] = ufo["country"].____

# Print the number of unique type values
print(len(____.unique()))

# Create a one-hot encoded set of the type values
type_set = ____

# Concatenate this set back to the ufo DataFrame
ufo = pd.concat([____, ____], axis=1)
Modifier et exécuter le code