Feature extraction & analysis: amzn_cons
You now decide to contrast this with the amzn_cons_corp
corpus in another bigram TDM. Of course, you expect to see some different phrases in your word cloud.
Once again, you will use this custom function to extract your bigram features for the visual:
tokenizer <- function(x)
NGramTokenizer(x, Weka_control(min = 2, max = 2))
Este ejercicio forma parte del curso
Text Mining with Bag-of-Words in R
Instrucciones del ejercicio
- Create
amzn_c_tdm
by convertingamzn_cons_corp
into aTermDocumentMatrix
and incorporating the bigram functioncontrol = list(tokenize = tokenizer)
. - Create
amzn_c_tdm_m
as a matrix version ofamzn_c_tdm
. - Create
amzn_c_freq
by usingrowSums()
to get term frequencies fromamzn_c_tdm_m
. - Create a
wordcloud()
usingnames(amzn_c_freq)
and the valuesamzn_c_freq
. Use the argumentsmax.words = 25
andcolor = "red"
as well.
Ejercicio interactivo práctico
Prueba este ejercicio completando el código de muestra.
# Create amzn_c_tdm
___ <- ___(
___,
___
)
# Create amzn_c_tdm_m
___ <- ___
# Create amzn_c_freq
___ <- ___
# Plot a word cloud of negative Amazon bigrams
___