Frequent terms with tm
Now that you know how to make a term-document matrix, as well as its transpose, the document-term matrix, we will use it as the basis for some analysis. In order to analyze it, we need to change it to a simple matrix, as we did in chapter 1 using as.matrix().
Calling rowSums() on your newly made matrix aggregates all the terms used in a passage. Once you have the rowSums(), you can sort() them with decreasing = TRUE, so you can focus on the most common terms.
Lastly, you can make a barplot() of the top 5 terms of term_frequency with the following code.
barplot(term_frequency[1:5], col = "#C0DE25")
Of course, you could take our ggplot2 courses to learn how to customize the plot even more… :)
This exercise is part of the course
Text Mining with Bag-of-Words in R
Exercise instructions
- Create
coffee_mas a matrix using the term-document matrixcoffee_tdmfrom the last chapter. - Create
term_frequencyusing therowSums()function oncoffee_m. - Sort
term_frequencyin descending order and store the result interm_frequency. - Use single square bracket subsetting, i.e., using only one
[, to print the top 10 terms fromterm_frequency. - Make a barplot of the top 10 terms.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
## coffee_tdm is still loaded in your workspace
# Convert coffee_tdm to a matrix
coffee_m <- ___
# Calculate the row sums of coffee_m
term_frequency <- ___
# Sort term_frequency in decreasing order
term_frequency <- ___
# View the top 10 most common words
___
# Plot a barchart of the 10 most common words
___(___, col = "tan", las = 2)