Session Ready
Exercise

TFIDF Practice

Earlier you looked at a bag-of-words representation of articles on crude oil. Calculating TFIDF values relies on this bag-of-words representation, but takes into account how often a word appears in an article, and how often that word appears in the collection of articles.

To determine how meaningful words would be when comparing different articles, calculate the TFIDF weights for the words in crude, a collection of 20 articles about crude oil.

Instructions
100 XP
  • Calculate TFIDF values for crude by article_id and by word. Save the resulting tibble as crude_weights.
  • Sort crude_weights with the arrange() function by descending tf_idf values.
  • Filter crude_weights to the lowest non-zero tf_idf values. Again, use the arrange function.