ComeçarComece de graça

Stop words and word clouds

Now that you are in the text mining mindset, sitting down for a nice glass of chardonnay, we need to dig deeper. In the last word cloud, "chardonnay" dominated the visual. It was so dominant that you couldn't draw out any other interesting insights.

Let's change the stop words to include "chardonnay" to see what other words are common, yet were originally drowned out.

Your workspace has a cleaned version of chardonnay tweets, but now let's remove some non-insightful terms. This exercise uses content() to show you a specific tweet for comparison. Remember to use double brackets to index the corpus list.

Este exercício faz parte do curso

Text Mining with Bag-of-Words in R

Ver curso

Instruções do exercício

  • Apply content() to the 24th document in chardonnay_corp.
  • Append "chardonnay" to the English stopwords, assigning to stops.
  • Examine the last six words in stops.
  • Create cleaned_chardonnay_corp with tm_map() by passing in the chardonnay_corp, the function removeWords() and finally the stopwords, stops.
  • Now examine the content of the 24 tweet again to compare results.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Review a "cleaned" tweet
___(___)

# Add to stopwords
stops <- c(stopwords(kind = 'en'), '___')

# Review last 6 stopwords 
tail(stops)

# Apply to a corpus
cleaned_chardonnay_corp <- ___(chardonnay_corp, ___, ___)

# Review a "cleaned" tweet again
content(cleaned_chardonnay_corp[[24]])
Editar e executar o código