amzn_cons dendrogram
It seems there is a strong indication of long working hours and poor work-life balance in the reviews. As a simple clustering technique, you decide to perform a hierarchical cluster and create a dendrogram to see how connected these phrases are.
Cet exercice fait partie du cours
Text Mining with Bag-of-Words in R
Instructions
- Create
amzn_c_tdmas aTermDocumentMatrixusingamzn_cons_corpwithcontrol = list(tokenize = tokenizer). - Print
amzn_c_tdmto the console. - Create
amzn_c_tdm2by applying theremoveSparseTerms()function toamzn_c_tdmwith thesparseargument equal to.993. - Create
hc, a hierarchical cluster object by nesting the distance matrixdist(amzn_c_tdm2)inside thehclust()function. Make sure to also passmethod = "complete"to thehclust()function. - Plot
hcto view the clustered bigrams and see how the concepts in the Amazon cons section may lead you to a conclusion.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Create amzn_c_tdm
___ <- ___(
___,
___
)
# Print amzn_c_tdm to the console
___
# Create amzn_c_tdm2 by removing sparse terms
___ <- ___
# Create hc as a cluster of distance values
___ <- ___(___,
___)
# Produce a plot of hc
___