Exercise

Encoding categorical columns III: DictVectorizer

Alright, one final trick before you dive into pipelines. The two step process you just went through - LabelEncoder followed by OneHotEncoder - can be simplified by using a DictVectorizer.

Using a DictVectorizer on a DataFrame that has been converted to a dictionary allows you to get label encoding as well as one-hot encoding in one go.

Your task is to work through this strategy in this exercise!

Instructions

100 XP
  • Import DictVectorizer from sklearn.feature_extraction.
  • Convert df into a dictionary called df_dict using its .to_dict() method with "records" as the argument.
  • Instantiate a DictVectorizer object called dv with the keyword argument sparse=False.
  • Apply the DictVectorizer on df_dict by using its .fit_transform() method.
  • Hit 'Submit Answer' to print the resulting first five rows and the vocabulary.