amzn_cons dendrogram
It seems there is a strong indication of long working hours and poor work-life balance in the reviews. As a simple clustering technique, you decide to perform a hierarchical cluster and create a dendrogram to see how connected these phrases are.
This exercise is part of the course
Text Mining with Bag-of-Words in R
Exercise instructions
- Create
amzn_c_tdm
as aTermDocumentMatrix
usingamzn_cons_corp
withcontrol = list(tokenize = tokenizer)
. - Print
amzn_c_tdm
to the console. - Create
amzn_c_tdm2
by applying theremoveSparseTerms()
function toamzn_c_tdm
with thesparse
argument equal to.993
. - Create
hc
, a hierarchical cluster object by nesting the distance matrixdist(amzn_c_tdm2)
inside thehclust()
function. Make sure to also passmethod = "complete"
to thehclust()
function. - Plot
hc
to view the clustered bigrams and see how the concepts in the Amazon cons section may lead you to a conclusion.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create amzn_c_tdm
___ <- ___(
___,
___
)
# Print amzn_c_tdm to the console
___
# Create amzn_c_tdm2 by removing sparse terms
___ <- ___
# Create hc as a cluster of distance values
___ <- ___(___,
___)
# Produce a plot of hc
___