Session Ready
Exercise

Frequent terms with tm

Now that you know how to make a term-document matrix, as well as its transpose, the document-term matrix, we will use it as the basis for some analysis. In order to analyze it, we need to change it to a simple matrix, as we did in chapter 1 using as.matrix().

Calling rowSums() on your newly made matrix aggregates all the terms used in a passage. Once you have the rowSums(), you can sort() them with decreasing = TRUE, so you can focus on the most common terms.

Lastly, you can make a barplot() of the top 5 terms of term_frequency with the following code.

barplot(term_frequency[1:5], col = "#C0DE25")

Of course, you could take our ggplot2 courses to learn how to customize the plot even more… :)

Instructions
100 XP
  • Create coffee_m as a matrix using the term-document matrix coffee_tdm from the last chapter.
  • Create term_frequency using the rowSums() function on coffee_m.
  • Sort term_frequency in descending order and store the result in term_frequency.
  • Use single square bracket subsetting, i.e., using only one [, to print the top 10 terms from term_frequency.
  • Make a barplot of the top 10 terms.