Feature extraction & analysis: amzn_cons
You now decide to contrast this with the amzn_cons_corp
corpus in another bigram TDM. Of course, you expect to see some different phrases in your word cloud.
Once again, you will use this custom function to extract your bigram features for the visual:
tokenizer <- function(x)
NGramTokenizer(x, Weka_control(min = 2, max = 2))
This exercise is part of the course
Text Mining with Bag-of-Words in R
Exercise instructions
- Create
amzn_c_tdm
by convertingamzn_cons_corp
into aTermDocumentMatrix
and incorporating the bigram functioncontrol = list(tokenize = tokenizer)
. - Create
amzn_c_tdm_m
as a matrix version ofamzn_c_tdm
. - Create
amzn_c_freq
by usingrowSums()
to get term frequencies fromamzn_c_tdm_m
. - Create a
wordcloud()
usingnames(amzn_c_freq)
and the valuesamzn_c_freq
. Use the argumentsmax.words = 25
andcolor = "red"
as well.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create amzn_c_tdm
___ <- ___(
___,
___
)
# Create amzn_c_tdm_m
___ <- ___
# Create amzn_c_freq
___ <- ___
# Plot a word cloud of negative Amazon bigrams
___