Feature extraction & analysis: amzn_cons

You now decide to contrast this with the amzn_cons_corp corpus in another bigram TDM. Of course, you expect to see some different phrases in your word cloud.

Once again, you will use this custom function to extract your bigram features for the visual:

tokenizer <- function(x) 
  NGramTokenizer(x, Weka_control(min = 2, max = 2))

This exercise is part of the course

Text Mining with Bag-of-Words in R

View Course

Exercise instructions

Create amzn_c_tdm by converting amzn_cons_corp into a TermDocumentMatrix and incorporating the bigram function control = list(tokenize = tokenizer).
Create amzn_c_tdm_m as a matrix version of amzn_c_tdm.
Create amzn_c_freq by using rowSums() to get term frequencies from amzn_c_tdm_m.
Create a wordcloud() using names(amzn_c_freq) and the values amzn_c_freq. Use the arguments max.words = 25 and color = "red" as well.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create amzn_c_tdm
___ <- ___(
  ___,
  ___
)

# Create amzn_c_tdm_m
___ <- ___

# Create amzn_c_freq
___ <- ___

# Plot a word cloud of negative Amazon bigrams
___

Edit and Run Code