1. Learn
  2. /
  3. Courses
  4. /
  5. Unsupervised Learning in Python

Exercise

A tf-idf word-frequency array

In this exercise, you'll create a tf-idf word frequency array for a toy collection of documents. For this, use the TfidfVectorizer from sklearn. It transforms a list of documents into a word frequency array, which it outputs as a csr_matrix. It has fit() and transform() methods like other sklearn objects.

You are given a list documents of toy documents about pets. Its contents have been printed in the IPython Shell.

Instructions

100 XP
  • Import TfidfVectorizer from sklearn.feature_extraction.text.
  • Create a TfidfVectorizer instance called tfidf.
  • Apply .fit_transform() method of tfidf to documents and assign the result to csr_mat. This is a word-frequency array in csr_matrix format.
  • Inspect csr_mat by calling its .toarray() method and printing the result. This has been done for you.
  • The columns of the array correspond to words. Get the list of words by calling the .get_feature_names() method of tfidf, and assign the result to words.