Get startedGet started for free

A tf-idf word-frequency array

In this exercise, you'll create a tf-idf word frequency array for a toy collection of documents. For this, use the TfidfVectorizer from sklearn. It transforms a list of documents into a word frequency array, which it outputs as a csr_matrix. It has fit() and transform() methods like other sklearn objects.

You are given a list documents of toy documents about pets.

This exercise is part of the course

Unsupervised Learning in Python

View Course

Exercise instructions

  • Import TfidfVectorizer from sklearn.feature_extraction.text.
  • Create a TfidfVectorizer instance called tfidf.
  • Apply .fit_transform() method of tfidf to documents and assign the result to csr_mat. This is a word-frequency array in csr_matrix format.
  • Inspect csr_mat by calling its .toarray() method and printing the result. This has been done for you.
  • The columns of the array correspond to words. Get the list of words by calling the .get_feature_names_out() method of tfidf, and assign the result to words.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import TfidfVectorizer
from ____ import ____

# Create a TfidfVectorizer: tfidf
tfidf = ____ 

# Apply fit_transform to document: csr_mat
csr_mat = ____

# Print result of toarray() method
print(csr_mat.toarray())

# Get the words: words
words = ____

# Print words
print(words)
Edit and Run Code