1. Common text mining visuals
This whole chapter is devoted to making visuals.
2. Why make visuals?
There are many reasons that visualization is an important skill for a text miner. Chief among them is the fact that good visualizations help decision makers come to quick conclusions. This is because the human brain can efficiently process visual information.
3. Setting the scene
In this chapter, you work on two corpora. One corpus contains 1000 tweets mentioning coffee. The other is made of 1000 tweets mentioning chardonnay.
4. Setting the scene
This chapter assumes you can clean and make a term document matrix as shown in chapter 1 so we can focus on the fun stuff. For many of the basic text mining plots in this chapter we will use this summed vector.
5. Term frequency plots with tm
For a quick taste of text mining you already made a term frequency plot. This section explains two ways to make one.
To make a frequency plot with the tm package, you change your term document matrix into a matrix using as-dot-matrix. Then you apply rowSums to the matrix to calculate a summed term vector. Next you use sort along with the parameter decreasing = TRUE. Once your summed term vector is sorted, you use plot, indexing the number of terms, one to ten. Next specify the color "tan" and label orientation 2 to make the axis labels vertical.
6. Term frequency plots with qdap
qdap provides the freq_terms function to get word frequency. You input a text vector tweets-text, top-words = 10, and at-least equal 3. The top-words argument specifies the top x number of words you want in your plot. The at-least argument simply tells the function a word should only appear if it occurs this number of times or more. Lastly, this example uses a predefined list of words to remove called Top200Words. Calling plot on the results gives you a slightly different frequent terms plot.
7. Let's practice!