Put it all together: a text-based dendrogram
Its time to put your skills to work to make your first text-based dendrogram. Remember, dendrograms reduce information to help you make sense of the data. This is much like how an average tells you something, but not everything, about a population. Both can be misleading. With text, there are often a lot of nonsensical clusters, but some valuable clusters may also appear.
A peculiarity of TDM and DTM objects is that you have to convert them first to matrices (with as.matrix()
), before using them with the dist()
function.
For the chardonnay tweets, you may have been surprised to see the soul music legend Marvin Gaye appears in the word cloud. Let's see if the dendrogram picks up the same.
Este ejercicio forma parte del curso
Text Mining with Bag-of-Words in R
Instrucciones del ejercicio
- Create
tweets_tdm2
by applyingremoveSparseTerms()
ontweets_tdm
. Usesparse = 0.975
. - Create
tdm_m
by usingas.matrix()
ontweets_tdm2
to convert it to matrix form. - Create
tweets_dist
containing the distances oftdm_m
using thedist()
function. - Create a hierarchical cluster object called
hc
usinghclust()
ontweets_dist
. - Make a dendrogram with
plot()
andhc
.
Ejercicio interactivo práctico
Prueba este ejercicio completando el código de muestra.
# Create tweets_tdm2
___ <- ___(___, ___)
# Create tdm_m
___ <- ___(___)
# Create tweets_dist
___ <- ___(___)
# Create hc
___ <- ___(___)
# Plot the dendrogram
___(___)