Encoding categorical columns III: DictVectorizer
Alright, one final trick before you dive into pipelines. The two step process you just went through - LabelEncoder
followed by OneHotEncoder
- can be simplified by using a DictVectorizer.
Using a DictVectorizer
on a DataFrame that has been converted to a dictionary allows you to get label encoding as well as one-hot encoding in one go.
Your task is to work through this strategy in this exercise!
This exercise is part of the course
Extreme Gradient Boosting with XGBoost
Exercise instructions
- Import
DictVectorizer
fromsklearn.feature_extraction
. - Convert
df
into a dictionary calleddf_dict
using its.to_dict()
method with"records"
as the argument. - Instantiate a
DictVectorizer
object calleddv
with the keyword argumentsparse=False
. - Apply the
DictVectorizer
ondf_dict
by using its.fit_transform()
method. - Hit 'Submit Answer' to print the resulting first five rows and the vocabulary.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import DictVectorizer
____
# Convert df into a dictionary: df_dict
df_dict = ____
# Create the DictVectorizer object: dv
dv = ____
# Apply dv on df: df_encoded
df_encoded = ____
# Print the resulting first five rows
print(df_encoded[:5,:])
# Print the vocabulary
print(dv.vocabulary_)