Get startedGet started for free

Cutting the tree

1. Cutting the tree

In the previous exercises you have learned how to plot and interpret the dendrogram. Now, let's learn how to leverage this visualization to both identify our clusters and highlight some of their key characteristics.

2. Cutting the tree

Let's continue our work with the soccer player dendrogram. Remember that the distance between the observations was calculated using euclidean distance and we used the complete linkage criteria. This means that at any given branch, all members that share this branch will have a euclidean distance amongst one another no greater than the height of that branch. We can leverage this idea to both select our clusters and also characterize the relationships of their members.

3. Cutting the tree

To do so we can cut our tree at any desired height. Let's choose 15 for now. This means that we remove all links above this cut point and we create our clusters below.

4. Cutting the tree

In this case two clusters are formed. Using this height cutoff we can already ascribe a characteristic to them. We can say that all members of the created clusters will have a euclidean distance amongst each other no greater than our cut height of 15. This statement is a function of our choice of height, distance metric and linkage criteria. This information can be very valuable as our data gets more features and becomes harder to plot using only two dimensions.

5. Coloring the dendrogram - height

We can visualize the clusters that form at any given height by leveraging the dendextend library to color our dendrogram plot. To do so we first must convert the hclust object into a dendrogram object by using the function as (dot) dendrogram. The next step is to use the color_branches function from the dendextend package to color the branches based on a desired criteria. In this case we want to cut using a height of 15, we represent this using the parameter h. Finally we use the plot function to plot the newly colored dendrogram.

6. Coloring the dendrogram - height

We can use this visual to further explore heights at which we may want to create our clusters. Let's say we believed a height of ten would be more appropriate, as shown in this plot with a proposed red line.

7. Coloring the dendrogram - height

We perform the steps to color the tree using an h equal to 10. The resulting dendrogram now has four colors for the corresponding four clusters.

8. Coloring the dendrogram - K

You can also leverage the color_branches to color the tree using a k criteria by just providing our desired k like so. Resulting in two clusters formed by the cutting of the last grouping.

9. cutree() using height

Just like color_branches can interchangeably use height or k, the cutree function we used to first make clusters can be used to assign cluster memberships using a provided height with the parameter h. As before, we can append this vector of cluster assignments to our data frame in order to empower us to do further exploration.

10. Let's practice!

Now that you know how to visualize and explore the results of your hierarchical clustering work, let's try it out.