Frequent terms with tm
Now that you know how to make a term-document matrix, as well as its transpose, the document-term matrix, we will use it as the basis for some analysis. In order to analyze it, we need to change it to a simple matrix, as we did in chapter 1 using as.matrix()
.
Calling rowSums()
on your newly made matrix aggregates all the terms used in a passage. Once you have the rowSums()
, you can sort()
them with decreasing = TRUE
, so you can focus on the most common terms.
Lastly, you can make a barplot()
of the top 5 terms of term_frequency
with the following code.
barplot(term_frequency[1:5], col = "#C0DE25")
Of course, you could take our ggplot2
courses to learn how to customize the plot even more… :)
This exercise is part of the course
Text Mining with Bag-of-Words in R
Exercise instructions
- Create
coffee_m
as a matrix using the term-document matrixcoffee_tdm
from the last chapter. - Create
term_frequency
using therowSums()
function oncoffee_m
. - Sort
term_frequency
in descending order and store the result interm_frequency
. - Use single square bracket subsetting, i.e., using only one
[
, to print the top 10 terms fromterm_frequency
. - Make a barplot of the top 10 terms.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
## coffee_tdm is still loaded in your workspace
# Convert coffee_tdm to a matrix
coffee_m <- ___
# Calculate the row sums of coffee_m
term_frequency <- ___
# Sort term_frequency in decreasing order
term_frequency <- ___
# View the top 10 most common words
___
# Plot a barchart of the 10 most common words
___(___, col = "tan", las = 2)