CommencerCommencer gratuitement

Feature extraction & analysis: amzn_cons

You now decide to contrast this with the amzn_cons_corp corpus in another bigram TDM. Of course, you expect to see some different phrases in your word cloud.

Once again, you will use this custom function to extract your bigram features for the visual:

tokenizer <- function(x) 
  NGramTokenizer(x, Weka_control(min = 2, max = 2))

Cet exercice fait partie du cours

Text Mining with Bag-of-Words in R

Afficher le cours

Instructions

  • Create amzn_c_tdm by converting amzn_cons_corp into a TermDocumentMatrix and incorporating the bigram function control = list(tokenize = tokenizer).
  • Create amzn_c_tdm_m as a matrix version of amzn_c_tdm.
  • Create amzn_c_freq by using rowSums() to get term frequencies from amzn_c_tdm_m.
  • Create a wordcloud() using names(amzn_c_freq) and the values amzn_c_freq. Use the arguments max.words = 25 and color = "red" as well.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Create amzn_c_tdm
___ <- ___(
  ___,
  ___
)

# Create amzn_c_tdm_m
___ <- ___

# Create amzn_c_freq
___ <- ___

# Plot a word cloud of negative Amazon bigrams
___
Modifier et exécuter le code