Exercise

Frequent terms with qdap

If you are OK giving up some control over the exact preprocessing steps, then a fast way to get frequent terms is with freq_terms() from qdap.

The function accepts a text variable, which, in our case, is the tweets$text vector. You can specify the top number of terms to show with the top argument, a vector of stop words to remove with the stopwords argument, and the minimum character length of a word to be included with the at.least argument. qdap has its own list of stop words that differ from those in tm. Our exercise will show you how to use either and compare their results.

Making a basic plot of the results is easy. Just call plot() on the freq_terms() object.

Instructions 1/2

undefined XP
  • 1
    • Create frequency using the freq_terms() function on tweets$text. Include arguments to accomplish the following:
      • Limit to the top 10 terms.
      • At least three letters per term.
      • Use "Top200Words" to define stop words.
    • Produce a plot() of the frequency object. Compare it to the plot you produced in the previous exercise.
  • 2
    • Again, create frequency using the freq_terms() function on tweets$text. Include the following arguments:
      • Limit to the top 10 terms.
      • At least three letters per term.
      • This time use stopwords("english") to define stop words.
    • Produce a plot() of frequency. Compare it to the plot of frequency. Do certain words change based on the stop words criterion?