Exercise

amzn_cons dendrogram

It seems there is a strong indication of long working hours and poor work-life balance in the reviews. As a simple clustering technique, you decide to perform a hierarchical cluster and create a dendrogram to see how connected these phrases are.

Instructions

100 XP
  • Create amzn_c_tdm as a TermDocumentMatrix using amzn_cons_corp with control = list(tokenize = tokenizer).
  • Print amzn_c_tdm to the console.
  • Create amzn_c_tdm2 by applying the removeSparseTerms() function to amzn_c_tdm with the sparse argument equal to .993.
  • Create hc, a hierarchical cluster object by nesting the distance matrix dist(amzn_c_tdm2) inside the hclust() function. Make sure to also pass method = "complete" to the hclust() function.
  • Plot hc to view the clustered bigrams and see how the concepts in the Amazon cons section may lead you to a conclusion.