Encoding categorical columns III: DictVectorizer

Alright, one final trick before you dive into pipelines. The two step process you just went through - LabelEncoder followed by OneHotEncoder - can be simplified by using a DictVectorizer.

Using a DictVectorizer on a DataFrame that has been converted to a dictionary allows you to get label encoding as well as one-hot encoding in one go.

Your task is to work through this strategy in this exercise!

Deze oefening maakt deel uit van de cursus

Extreme Gradient Boosting with XGBoost

Cursus bekijken

Oefeninstructies

Import DictVectorizer from sklearn.feature_extraction.
Convert df into a dictionary called df_dict using its .to_dict() method with "records" as the argument.
Instantiate a DictVectorizer object called dv with the keyword argument sparse=False.
Apply the DictVectorizer on df_dict by using its .fit_transform() method.
Hit 'Submit Answer' to print the resulting first five rows and the vocabulary.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Import DictVectorizer
____

# Convert df into a dictionary: df_dict
df_dict = ____

# Create the DictVectorizer object: dv
dv = ____

# Apply dv on df: df_encoded
df_encoded = ____

# Print the resulting first five rows
print(df_encoded[:5,:])

# Print the vocabulary
print(dv.vocabulary_)

Code bewerken en uitvoeren