CommencerCommencer gratuitement

Encoding categorical columns III: DictVectorizer

Alright, one final trick before you dive into pipelines. The two step process you just went through - LabelEncoder followed by OneHotEncoder - can be simplified by using a DictVectorizer.

Using a DictVectorizer on a DataFrame that has been converted to a dictionary allows you to get label encoding as well as one-hot encoding in one go.

Your task is to work through this strategy in this exercise!

Cet exercice fait partie du cours

Extreme Gradient Boosting with XGBoost

Afficher le cours

Instructions

  • Import DictVectorizer from sklearn.feature_extraction.
  • Convert df into a dictionary called df_dict using its .to_dict() method with "records" as the argument.
  • Instantiate a DictVectorizer object called dv with the keyword argument sparse=False.
  • Apply the DictVectorizer on df_dict by using its .fit_transform() method.
  • Hit 'Submit Answer' to print the resulting first five rows and the vocabulary.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Import DictVectorizer
____

# Convert df into a dictionary: df_dict
df_dict = ____

# Create the DictVectorizer object: dv
dv = ____

# Apply dv on df: df_encoded
df_encoded = ____

# Print the resulting first five rows
print(df_encoded[:5,:])

# Print the vocabulary
print(dv.vocabulary_)
Modifier et exécuter le code