LoslegenKostenlos loslegen

Word frequency analysis

Congratulations! You've just joined PyBooks. PyBooks is developing a book recommendation system and they want to find patterns and trends in text to improve their recommendations.

To begin, you'll want to understand the frequency of words in a given text and remove any rare words.

Note that typical real-world datasets will be larger than this example.

Diese Übung ist Teil des Kurses

Deep Learning for Text with PyTorch

Kurs anzeigen

Anleitung zur Übung

  • Import get_tokenizer from torchtext and FreqDist from the nltk library.
  • Initialize the tokenizer for English and tokenize the given text.
  • Calculate the frequency distribution of the tokens and remove rare words using list comprehension.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Import the necessary functions
from torchtext.data.utils import ____
from nltk.probability import ____

text = "In the city of Dataville, a data analyst named Alex explores hidden insights within vast data. With determination, Alex uncovers patterns, cleanses the data, and unlocks innovation. Join this adventure to unleash the power of data-driven decisions."

# Initialize the tokenizer and tokenize the text
tokenizer = ____("basic_english")
tokens = tokenizer(____)

threshold = 1
# Remove rare words and print common tokens
freq_dist = ____(____)
common_tokens = [token for token in tokens if ____[token] > ____]
print(common_tokens)
Code bearbeiten und ausführen