Get startedGet started for free

amzn_cons dendrogram

It seems there is a strong indication of long working hours and poor work-life balance in the reviews. As a simple clustering technique, you decide to perform a hierarchical cluster and create a dendrogram to see how connected these phrases are.

This exercise is part of the course

Text Mining with Bag-of-Words in R

View Course

Exercise instructions

  • Create amzn_c_tdm as a TermDocumentMatrix using amzn_cons_corp with control = list(tokenize = tokenizer).
  • Print amzn_c_tdm to the console.
  • Create amzn_c_tdm2 by applying the removeSparseTerms() function to amzn_c_tdm with the sparse argument equal to .993.
  • Create hc, a hierarchical cluster object by nesting the distance matrix dist(amzn_c_tdm2) inside the hclust() function. Make sure to also pass method = "complete" to the hclust() function.
  • Plot hc to view the clustered bigrams and see how the concepts in the Amazon cons section may lead you to a conclusion.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create amzn_c_tdm
___ <- ___(
  ___,
  ___
)

# Print amzn_c_tdm to the console
___

# Create amzn_c_tdm2 by removing sparse terms 
___ <- ___

# Create hc as a cluster of distance values
___ <- ___(___,
           ___)

# Produce a plot of hc
___
Edit and Run Code