Get startedGet started for free

Encoding categorical variables

There are couple of columns in the UFO dataset that need to be encoded before they can be modeled through scikit-learn. You'll do that transformation here, using both binary and one-hot encoding methods.

This exercise is part of the course

Preprocessing for Machine Learning in Python

View Course

Exercise instructions

  • Using apply(), write a conditional lambda function that returns a 1 if the value is "us", else return 0.
  • Print out the number of .unique() values in the type column.
  • Using pd.get_dummies(), create a one-hot encoded set of the type column.
  • Finally, use pd.concat() to concatenate the type_set encoded variables to the ufo dataset.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Use pandas to encode us values as 1 and others as 0
ufo["country_enc"] = ufo["country"].____

# Print the number of unique type values
print(len(____.unique()))

# Create a one-hot encoded set of the type values
type_set = ____

# Concatenate this set back to the ufo DataFrame
ufo = pd.concat([____, ____], axis=1)
Edit and Run Code