Visualize and interpret segmentation solutions

1. Visualize and interpret segmentation solutions

Finally, we will reap the fruits of our segmentation by exploring them.

2. Methods to explore segments

Once the segments are built, the standard way to explore them is to calculate average, median or other percentile values for each variable, grouped by the segment label. Here, we explore how the segments relate to the original product purchases and whether we can give them a name. Another approach is to calculate relative importance for each variable by segment. You can explore this and other exploration methods in more depth in another datacamp course called customer segmentation in python. Finally, the summary data is more easily interpretable when it's visualized - a good method is a heatmap from seaborn package.

3. Analyze average K-means segmentation attributes

We start by using our original wholesale dataset with a segment column that we previously created as wholesale_kmeans4. We group by the segment label and calculate average values. We can see that the four segments have very different average values for each of the columns. Still, it's not that easy to interpret, so we'll plot a heatmap that adds color labeling on top of the values.

4. Plot average K-means segmentation attributes

We call the heatmap from the seaborn package on the previously created segment summary dataset. Also, we call the .T method to transpose it so it's more interpretable - it's a matter of personal taste if you want to do this. Now, we can see the average purchase values for each customer segment. It's clear, that segment 0 is mostly grocery shoppers, segment 1 is mostly milk and frozen product buyers, segment 2 purchases mostly fresh products yet more lightly than segment 1, and finally segment 3 is a heavy user of fresh, milk, grocery and detergets and paper products.

5. Plot average NMF segmentation attributes

Now, we will explore the non-negative matrix factorization solution. We start by using our original wholesale dataset with a segment column that we previously created as wholesale_nmf4. Same as with kmeans - we group by the segment label and calculate average values. Then, we plot it as a heatmap to compare it with the kmeans solution. There are some differences: We do see that segment 0 is a heavy Fresh product buyer, while segment 1 buys mostly groceries, milk and detergents and paper. Segment 2 on the other hand is also heavy on milk and groceries, but buys more milk and less groceries than segment 1. Finally, segment 3 buys mostly frozen and fresh products.

6. Let's build 3-segment solutions!

Fantastic progress! Now, you will build segments on your own. And you will test a different k value - while in the video we used the 4-segment solution as suggested by the elbow criterion method - you will test the 3-segment solution with both models in the upcoming exercises. Good luck!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.