ComenzarEmpieza gratis

Put it all together: a text-based dendrogram

Its time to put your skills to work to make your first text-based dendrogram. Remember, dendrograms reduce information to help you make sense of the data. This is much like how an average tells you something, but not everything, about a population. Both can be misleading. With text, there are often a lot of nonsensical clusters, but some valuable clusters may also appear.

A peculiarity of TDM and DTM objects is that you have to convert them first to matrices (with as.matrix()), before using them with the dist() function.

For the chardonnay tweets, you may have been surprised to see the soul music legend Marvin Gaye appears in the word cloud. Let's see if the dendrogram picks up the same.

Este ejercicio forma parte del curso

Text Mining with Bag-of-Words in R

Ver curso

Instrucciones del ejercicio

  • Create tweets_tdm2 by applying removeSparseTerms() on tweets_tdm. Use sparse = 0.975.
  • Create tdm_m by using as.matrix() on tweets_tdm2 to convert it to matrix form.
  • Create tweets_dist containing the distances of tdm_m using the dist() function.
  • Create a hierarchical cluster object called hc using hclust() on tweets_dist.
  • Make a dendrogram with plot() and hc.

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

# Create tweets_tdm2
___ <- ___(___, ___)

# Create tdm_m
___ <- ___(___)

# Create tweets_dist
___ <- ___(___)

# Create hc
___ <- ___(___)

# Plot the dendrogram
___(___)
Editar y ejecutar código