Get startedGet started for free

Put it all together: a text-based dendrogram

Its time to put your skills to work to make your first text-based dendrogram. Remember, dendrograms reduce information to help you make sense of the data. This is much like how an average tells you something, but not everything, about a population. Both can be misleading. With text, there are often a lot of nonsensical clusters, but some valuable clusters may also appear.

A peculiarity of TDM and DTM objects is that you have to convert them first to matrices (with as.matrix()), before using them with the dist() function.

For the chardonnay tweets, you may have been surprised to see the soul music legend Marvin Gaye appears in the word cloud. Let's see if the dendrogram picks up the same.

This exercise is part of the course

Text Mining with Bag-of-Words in R

View Course

Exercise instructions

  • Create tweets_tdm2 by applying removeSparseTerms() on tweets_tdm. Use sparse = 0.975.
  • Create tdm_m by using as.matrix() on tweets_tdm2 to convert it to matrix form.
  • Create tweets_dist containing the distances of tdm_m using the dist() function.
  • Create a hierarchical cluster object called hc using hclust() on tweets_dist.
  • Make a dendrogram with plot() and hc.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create tweets_tdm2
___ <- ___(___, ___)

# Create tdm_m
___ <- ___(___)

# Create tweets_dist
___ <- ___(___)

# Create hc
___ <- ___(___)

# Plot the dendrogram
___(___)
Edit and Run Code