1. Introduction to hierarchical clustering
In this chapter, you will learn about a different method of clustering called hierarchical clustering.
2. Hierarchical clustering
Hierarchical clustering is used when the number of clusters is not known ahead of time. This is different from kmeans clustering where you first have to specify the number of clusters and then execute the algorithm.
There are two approaches to hierarchical clustering: bottom-up and top-down. This course will focus on bottom-up clustering.
3. Simple example
To gain intuition about this, let's start with a simple example of five observations each with two features.
4. Five clusters
Bottom-up hierarchical clustering starts by assigning each point to its own cluster. So in this example, there are five clusters because there are five points. They are color-coded for reference.
5. Four clusters
The next step of bottom-up hierarchical clustering is to find the closest two clusters and to join them together into a single cluster. In this example, we now have four clusters since the purple and orange clusters are combined into a single cluster.
6. Three clusters
This process continues iteratively, finding the next pair of clusters that are closest to each other and combining them into a single cluster. Here you can see the green and red clusters are combined, resulting in a total of three clusters at this step in the algorithm.
7. Two clusters
Again, the hierarchical cluster algorithm continues by joining the two closest clusters together into a single cluster.
8. One cluster
This continues until there is only one cluster. Once there is only a single cluster, the hierarchical clustering algorithm stops.
I have skipped a few details, like how distance is measured between clusters -- those details are not needed right now; but I will cover them in the next few videos.
9. Hierarchical clustering in R
Performing hierarchical clustering in R requires only one parameter -- the distance between the observations. There are many ways to calculate the distance between observations -- for this class we will use standard Euclidean distance. This is calculated using the dist() function in R. The parameter to dist() is a matrix of the same structure as other matrices used in machine learning: one observation per row, one feature per column.
The resultant distance matrix is then passed in as the 'd' parameter to the hclust() function in R. This will then return a hierarchical clustering model for interrogation and use.
There are a few more parameters available in the hclust() function, but this is enough to get started with creating models. We will cover other typical options later, and as before when you are ready the R documentation for hclust() is a good resource.
10. Let's practice!
OK, let's practice what you've learned in the coming exercises.