Aan de slagGa gratis aan de slag

Encoding categorical variables

There are couple of columns in the UFO dataset that need to be encoded before they can be modeled through scikit-learn. You'll do that transformation here, using both binary and one-hot encoding methods.

Deze oefening maakt deel uit van de cursus

Preprocessing for Machine Learning in Python

Cursus bekijken

Oefeninstructies

  • Using apply(), write a conditional lambda function that returns a 1 if the value is "us", else return 0.
  • Print out the number of .unique() values in the type column.
  • Using pd.get_dummies(), create a one-hot encoded set of the type column.
  • Finally, use pd.concat() to concatenate the type_set encoded variables to the ufo dataset.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Use pandas to encode us values as 1 and others as 0
ufo["country_enc"] = ufo["country"].____

# Print the number of unique type values
print(len(____.unique()))

# Create a one-hot encoded set of the type values
type_set = ____

# Concatenate this set back to the ufo DataFrame
ufo = pd.concat([____, ____], axis=1)
Code bewerken en uitvoeren