Get startedGet started for free

Making sense of the clusters

1. Making sense of the clusters

Over the last series of exercises, you have developed the tools you need to run hierarchical clustering and the intuition to understand the impact of each step. Now you will have a chance to use these skills by clustering a new dataset.

2. Wholesale dataset

You will work with a series of 45 records of customer spending from a wholesale distributor. For each customer record you will have 3 features, spending on Milk, Grocery and Frozen Food.

3. Wholesale dataset

The dataset will look like this. You will notice that unlike the soccer positions data set, where we only have two features (x and y), this dataset has three features. The consequence of this is that we can't simply explore what the clusters mean from a two dimensional plot.

4. Exploring more than 2 dimensions

There are several approaches to overcome this. Once you have assigned the cluster memberships you can make multiple plots with feature pairs and use color to show the difference in clusters. This can be helpful, but only captures one angle of the complex interactions at a time. Also this approach can quickly get out of hand when the number of features expands. Alternatively, you can use dimensionality reduction methods such as principal component analysis in order to plot your multi-dimensional data onto two dimensions and color the points using the cluster assignment. This can be helpful to see if your observations clustered well and the clusters are well separated. However, this type of analysis is difficult to interpret and wouldn't shed light on the characteristics of the clusters. Finally, you can simply explore the distribution characteristics such as the mean and median of each feature within your clusters. By comparing these summary statistics between clusters you can begin to build a narrative of what makes the observations within the cluster similar to each other while different from the observations in the other clusters.

5. Segment the customers

In the next series of exercises you will use this data identify the clusters of customers that form based on their spending. This is a common use case of cluster analysis where the desired outcome is to segment customers based on their behaviors. Once the segments are identified we can explore their common characteristics to gain insights into our customer base and design value-driven opportunities using this data. Let's get started.