CommencerCommencer gratuitement

Applying TF-IDF to book descriptions

PyBooks has collected several book descriptions and wants to identify important words within them using the TF-IDF encoding technique. By doing this, they hope to gain more insights into the unique attributes of each book to help with their book recommendation system.

The following packages have been imported for you: torch, torchtext.

Cet exercice fait partie du cours

Deep Learning for Text with PyTorch

Afficher le cours

Instructions

  • Import the TfidfVectorizer class from sklearn.feature_extraction.text that converts a collection of raw documents to a matrix of TF-IDF features.
  • Instantiate an object of this class, then use this object to encode the descriptions into a TF-IDF matrix of vectors.
  • Retrieve and display the first five feature names from the vectorizer and encoded vectors from tfidf_encoded_descriptions.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Importing TF-IDF from sklearn
from sklearn.feature_extraction.text import ____

# Initialize TF-IDF encoding vectorizer
vectorizer = ____()
tfidf_encoded_descriptions = vectorizer.____(descriptions)

# Extract and print the first five features
print(____.get_feature_names_out()[:5])
print(____.toarray()[0, :5])
Modifier et exécuter le code