Get startedGet started for free

Introduction to hierarchical clustering

1. Introduction to hierarchical clustering

In this chapter, you will learn about a different method of clustering called hierarchical clustering.

2. Hierarchical clustering

Hierarchical clustering is used when the number of clusters is not known ahead of time. This is different from kmeans clustering where you first have to specify the number of clusters and then execute the algorithm. There are two approaches to hierarchical clustering: bottom-up and top-down. This course will focus on bottom-up clustering.

3. Simple example

To gain intuition about this, let's start with a simple example of five observations each with two features.

4. Five clusters

Bottom-up hierarchical clustering starts by assigning each point to its own cluster. So in this example, there are five clusters because there are five points. They are color-coded for reference.

5. Four clusters

The next step of bottom-up hierarchical clustering is to find the closest two clusters and to join them together into a single cluster. In this example, we now have four clusters since the purple and orange clusters are combined into a single cluster.

6. Three clusters

This process continues iteratively, finding the next pair of clusters that are closest to each other and combining them into a single cluster. Here you can see the green and red clusters are combined, resulting in a total of three clusters at this step in the algorithm.

7. Two clusters

Again, the hierarchical cluster algorithm continues by joining the two closest clusters together into a single cluster.

8. One cluster

This continues until there is only one cluster. Once there is only a single cluster, the hierarchical clustering algorithm stops. I have skipped a few details, like how distance is measured between clusters -- those details are not needed right now; but I will cover them in the next few videos.

9. Hierarchical clustering in R

Performing hierarchical clustering in R requires only one parameter -- the distance between the observations. There are many ways to calculate the distance between observations -- for this class we will use standard Euclidean distance. This is calculated using the dist() function in R. The parameter to dist() is a matrix of the same structure as other matrices used in machine learning: one observation per row, one feature per column. The resultant distance matrix is then passed in as the 'd' parameter to the hclust() function in R. This will then return a hierarchical clustering model for interrogation and use. There are a few more parameters available in the hclust() function, but this is enough to get started with creating models. We will cover other typical options later, and as before when you are ready the R documentation for hclust() is a good resource.

10. Let's practice!

OK, let's practice what you've learned in the coming exercises.