ComenzarEmpieza gratis

Feature extraction & analysis: amzn_cons

You now decide to contrast this with the amzn_cons_corp corpus in another bigram TDM. Of course, you expect to see some different phrases in your word cloud.

Once again, you will use this custom function to extract your bigram features for the visual:

tokenizer <- function(x) 
  NGramTokenizer(x, Weka_control(min = 2, max = 2))

Este ejercicio forma parte del curso

Text Mining with Bag-of-Words in R

Ver curso

Instrucciones del ejercicio

  • Create amzn_c_tdm by converting amzn_cons_corp into a TermDocumentMatrix and incorporating the bigram function control = list(tokenize = tokenizer).
  • Create amzn_c_tdm_m as a matrix version of amzn_c_tdm.
  • Create amzn_c_freq by using rowSums() to get term frequencies from amzn_c_tdm_m.
  • Create a wordcloud() using names(amzn_c_freq) and the values amzn_c_freq. Use the arguments max.words = 25 and color = "red" as well.

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

# Create amzn_c_tdm
___ <- ___(
  ___,
  ___
)

# Create amzn_c_tdm_m
___ <- ___

# Create amzn_c_freq
___ <- ___

# Plot a word cloud of negative Amazon bigrams
___
Editar y ejecutar código