Stop words and word clouds
Now that you are in the text mining mindset, sitting down for a nice glass of chardonnay, we need to dig deeper. In the last word cloud, "chardonnay" dominated the visual. It was so dominant that you couldn't draw out any other interesting insights.
Let's change the stop words to include "chardonnay" to see what other words are common, yet were originally drowned out.
Your workspace has a cleaned version of chardonnay tweets, but now let's remove some non-insightful terms. This exercise uses content()
to show you a specific tweet for comparison. Remember to use double brackets to index the corpus list.
Diese Übung ist Teil des Kurses
Text Mining with Bag-of-Words in R
Anleitung zur Übung
- Apply
content()
to the 24th document inchardonnay_corp
. - Append
"chardonnay"
to the English stopwords, assigning tostops
. - Examine the last six words in
stops
. - Create
cleaned_chardonnay_corp
withtm_map()
by passing in thechardonnay_corp
, the functionremoveWords()
and finally the stopwords,stops
. - Now examine the
content
of the24
tweet again to compare results.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Review a "cleaned" tweet
___(___)
# Add to stopwords
stops <- c(stopwords(kind = 'en'), '___')
# Review last 6 stopwords
tail(stops)
# Apply to a corpus
cleaned_chardonnay_corp <- ___(chardonnay_corp, ___, ___)
# Review a "cleaned" tweet again
content(cleaned_chardonnay_corp[[24]])