Get startedGet started for free

Making sense of the K-means clusters

1. Making sense of the K-means clusters

Throughout this chapter you have worked to develop an understanding and an intuition of how to use the kmeans algorithm and its associated techniques to perform clustering analysis. Now it's time to put these tools into practice by revisiting the wholesale dataset.

2. Wholesale dataset

You have learned a lot since you've last looked at this data so let's have a quick refresher. The wholesale dataset is an exercise in clustering the customers of a wholesale distributor. This use of clustering is also known as market segmentation. The wholesale data consists of 45 observations of client purchases for milk, grocery and frozen food. The data is stored in the data frame customers_spend.

3. Segmenting with hierarchical clustering

At the end of chapter two you used hierarchical clustering to segment the customers into four clusters using a height that seemed appropriate based on the structure of the tree.

4. Segmenting with hierarchical clustering

You then characterized these customer segments by calculating the average of their spending in each category. From this analysis you learned that segments one, three, and four consist of around five observations each and their members collectively spend more on one category relative to the others. In a real world scenario a finding like this could be used to provide more customized advertising or other targeting for these groups based on their spending habits. Do you think the result will be the be the same if you used a different method for clustering?

5. Segmenting with K-means

Let's find out. In the following exercises you will leverage the kmeans tools you have learned in this chapter to: First estimate the best value of k by finding the maximum average silhouette width with respect to k. Then you will use this value of k to create a kmeans model. Finally, you will characterize these k clusters by calculating their average spending in each category like you have in the previous chapter. As you progress through these exercises feel free to look back and compare your results with the corresponding hierarchical clustering exercises. I encourage you to see if the results are different and speculate as to why that may be the case? Most importantly, you must remember that both of these clustering methods are descriptive and not prescriptive. In other words, they will provide different lenses with which you can understand your underlying data but the choice of which to use and how to correctly use it will be highly dependent on the question at hand as well as an understanding of the underlying subject matter.

6. Let's cluster!

So, what are you waiting for, let's see how kmeans segments our customers.