1. Visualizing the dendrogram
As you recently learned, the process of hierarchical clustering involves iteratively grouping observations via pairwise comparisons until all observations are gathered into a single group. We can represent this grouping visually using a plot called the dendrogram, also knowns as a tree diagram.
2. Building the dendrogram
To build a dendrogram, let's start with the same 6 player soccer lineup from our last video.
On the left we have the positions of the players and on the right we will assemble a dendgroram as we iteratively group these observations.
3. Building the dendrogram
As before we can start the process of hierarchical clustering by taking the two closest observations and grouping them.
4. Building the dendrogram
Correspondingly we can represent this grouping in the tree diagram.
5. Building the dendrogram
The dendrogram encodes a very important attribute of our grouping, the distance between the observations that were grouped.
This is captured by the height axis.
In this case the distance between the two observations is 4 point 1 and correspondingly their shared branch is at that height.
6. Building the dendrogram
As before we would form the next closest group, by comparing the pairwise distances and linkage criteria-based distances among the observations and existing groups.
7. Building the dendrogram
The first group with more than two observations now forms for one two and four and is accordingly represented in the dendrogram.
8. Building the dendrogram
The common branch between these three observations again encodes distance, more specifically it is a function of linkage criteria-based distance among all three observations.
This is a very important feature of the dendrogram.
It allows us to say something very concrete about our grouped observations at any given height.
Remember that for distance we chose euclidean distance and the linkage criteria used was the complete method, which is the maximum distance between the group members.
So in this case we can look at this dendrogram and say that the members that are a part of this branch, observations one two and four, have a euclidean distance between each other of 12 or less.
We will leverage this attribute of the tree in our next video, but in the mean time let's continue to build the dendrogram.
9. Building the dendrogram
Iteratively joining the observations and groups.
10. Building the dendrogram
Until all are joined into a single group.
11. Plotting the dendrogram
Of course, we don't actually do this manually in R.
To visualize a dendrogram, all we need to do is plot the corresponding hclust object.
In this case, we will reuse the hc_players object we created in the previous video to plot our dendrogram.
12. Let's practice!
Now that you know how to visualize hierarchical clustering lets explore what kind of impact the decision of linkage criteria can have on the dendrogram.