Get startedGet started for free

Frequent terms with tm

Now that you know how to make a term-document matrix, as well as its transpose, the document-term matrix, we will use it as the basis for some analysis. In order to analyze it, we need to change it to a simple matrix, as we did in chapter 1 using as.matrix().

Calling rowSums() on your newly made matrix aggregates all the terms used in a passage. Once you have the rowSums(), you can sort() them with decreasing = TRUE, so you can focus on the most common terms.

Lastly, you can make a barplot() of the top 5 terms of term_frequency with the following code.

barplot(term_frequency[1:5], col = "#C0DE25")

Of course, you could take our ggplot2 courses to learn how to customize the plot even more… :)

This exercise is part of the course

Text Mining with Bag-of-Words in R

View Course

Exercise instructions

  • Create coffee_m as a matrix using the term-document matrix coffee_tdm from the last chapter.
  • Create term_frequency using the rowSums() function on coffee_m.
  • Sort term_frequency in descending order and store the result in term_frequency.
  • Use single square bracket subsetting, i.e., using only one [, to print the top 10 terms from term_frequency.
  • Make a barplot of the top 10 terms.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

## coffee_tdm is still loaded in your workspace

# Convert coffee_tdm to a matrix
coffee_m <- ___

# Calculate the row sums of coffee_m
term_frequency <- ___

# Sort term_frequency in decreasing order
term_frequency <- ___

# View the top 10 most common words
___

# Plot a barchart of the 10 most common words
___(___, col = "tan", las = 2)
Edit and Run Code