Mapping feature indices with feature names
In the lesson video, we had seen that CountVectorizer
doesn't necessarily index the vocabulary in alphabetical order. In this exercise, we will learn to map each feature index to its corresponding feature name from the vocabulary.
We will use the same three sentences on lions from the video. The sentences are available in a list named corpus
and has already been printed to the console.
This exercise is part of the course
Feature Engineering for NLP in Python
Exercise instructions
- Instantiate a
CountVectorizer
object. Name itvectorizer
. - Using
fit_transform()
, generatebow_matrix
forcorpus
. - Using the
get_feature_names()
method, map the column names to the corresponding word in the vocabulary.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create CountVectorizer object
vectorizer = ____
# Generate matrix of word vectors
bow_matrix = vectorizer.____(____)
# Convert bow_matrix into a DataFrame
bow_df = pd.DataFrame(bow_matrix.toarray())
# Map the column names to vocabulary
bow_df.columns = vectorizer.____
# Print bow_df
print(bow_df)