A tf-idf word-frequency array
In this exercise, you'll create a tf-idf word frequency array for a toy collection of documents. For this, use the TfidfVectorizer
from sklearn. It transforms a list of documents into a word frequency array, which it outputs as a csr_matrix. It has fit()
and transform()
methods like other sklearn objects.
You are given a list documents
of toy documents about pets.
This exercise is part of the course
Unsupervised Learning in Python
Exercise instructions
- Import
TfidfVectorizer
fromsklearn.feature_extraction.text
. - Create a
TfidfVectorizer
instance calledtfidf
. - Apply
.fit_transform()
method oftfidf
todocuments
and assign the result tocsr_mat
. This is a word-frequency array in csr_matrix format. - Inspect
csr_mat
by calling its.toarray()
method and printing the result. This has been done for you. - The columns of the array correspond to words. Get the list of words by calling the
.get_feature_names_out()
method oftfidf
, and assign the result towords
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import TfidfVectorizer
from ____ import ____
# Create a TfidfVectorizer: tfidf
tfidf = ____
# Apply fit_transform to document: csr_mat
csr_mat = ____
# Print result of toarray() method
print(csr_mat.toarray())
# Get the words: words
words = ____
# Print words
print(words)