CommencerCommencer gratuitement

Put it all together: a text-based dendrogram

Its time to put your skills to work to make your first text-based dendrogram. Remember, dendrograms reduce information to help you make sense of the data. This is much like how an average tells you something, but not everything, about a population. Both can be misleading. With text, there are often a lot of nonsensical clusters, but some valuable clusters may also appear.

A peculiarity of TDM and DTM objects is that you have to convert them first to matrices (with as.matrix()), before using them with the dist() function.

For the chardonnay tweets, you may have been surprised to see the soul music legend Marvin Gaye appears in the word cloud. Let's see if the dendrogram picks up the same.

Cet exercice fait partie du cours

Text Mining with Bag-of-Words in R

Afficher le cours

Instructions

  • Create tweets_tdm2 by applying removeSparseTerms() on tweets_tdm. Use sparse = 0.975.
  • Create tdm_m by using as.matrix() on tweets_tdm2 to convert it to matrix form.
  • Create tweets_dist containing the distances of tdm_m using the dist() function.
  • Create a hierarchical cluster object called hc using hclust() on tweets_dist.
  • Make a dendrogram with plot() and hc.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Create tweets_tdm2
___ <- ___(___, ___)

# Create tdm_m
___ <- ___(___)

# Create tweets_dist
___ <- ___(___)

# Create hc
___ <- ___(___)

# Plot the dendrogram
___(___)
Modifier et exécuter le code