Session Ready
Exercise

BoW Example

In literature reviews, researchers read and summarize as many available texts about a subject as possible. Sometimes they end up reading duplicate articles, or summaries of articles they have already read. You have been given 20 articles about crude oil as an R object named crude_tibble. Instead of jumping straight to reading each article, you have decided to see what words are shared across these articles. To do so, you will start by building a bag-of-words representation of the text.

Instructions
100 XP
  • Create a BoW representation by counting the number of words by article using the column article_id.
  • Use the output to determine how many unique unique article/word combinations were created.
  • Filter the results to mentions of 'prices'.
  • How many articles have the word prices used in them?