Profile and interpret segments

1. Profile and interpret segments

Great work on finding the right number of clusters to begin our segmentation! Now we will learn how to profile and interpret our segments.

2. Approaches to build customer personas

The are multiple ways to build customer personas. You have already seen the approach where we assign the cluster label to the original dataset and then calculate average values of each cluster. Another approach is to use snake plots - a chart that visualizes RFM values between the segments. You also need to know how to calculate the relative importance of each cluster's attributes compared to the population average.

3. Summary statistics of each cluster

Ok! We will now calculate average RFM values for each of the 2 segments we have built previously. First, we create a new dataframe that combines RFM values from our original dataset and the Cluster labels from the 2-cluster k-means solution. Then we aggregate RFM values and calculate their averages. Finally, we compare this against a 3-cluster solution.

4. Summary statistics of each cluster

As you can see there are some inherent differences between the 2-segment and 3-segment solutions. While the former is simpler, the 3-segment solution gives more insights.

5. Snake plots to understand and compare segments

Great! Now we will learn about snake plots. Snake plots are a market research visualization technique plotting segments and their RFM values on a line chart. We need to normalize the data so the values would be comparable. Finally, we plot each cluster's average values on a line plot.

6. Prepare data for a snake plot

Let's code the snake plot now! First, we create a DataFrame from our normalized NumPy array. We will pass it to the pandas DataFrame function, and use the index and columns from the original datamart_rfm. Then we will assign a cluster label from the datamart_rfm_k3 dataset. For easier plotting, we will melt the dataset into a long format. We basically melt the three RFM columns and create one called attribute. The attribute column will store the name of the metric, and then store its value into another column.

7. Visualize a snake plot

Finally, we can visualize the snake plot. We add the title and then use the lineplot function from seaborn package. We pass the attribute to the X axis, and the value to the Y axis. Finally, we pass the cluster label to the hue argument, which will draw separate lines for each cluster. And here's the snake plot - it makes it very easy and intuitive to interpret, compare the segments and identify interesting insights.

8. Relative importance of segment attributes

Now there is another technique, slightly different from the snake plot, although the underlying data preparation is similar. In general, we want our segments to differ from the overall population, and have distinctive properties of their own. We can use this technique to identify relative importance of each attribute. First, we calculate the average RFM values for each cluster. Then, we do the same for the total population. Finally, we divide the two, and subtract 1 from the result.

9. Analyze and plot relative importance

The result is a relative importance score for each RFM value of the segments. The further that ratio is from zero, the more important that attribute is for defining a specific cluster compared to the population average. We can view it by just looking at rounded values, or we can plot a heatmap which we used in our first lesson on cohort analysis.

10. Relative importance heatmap

Here we go - the heatmap plot is easier to interpret, and it contains the actual numeric values as well. If you compare it to the previous printed output, it's clear that the heatmap has a visual advantage over the print() statement.

11. Your time to experiment with different customer profiling techniques!

Great job everybody! Now it's your time to experiment with these techniques!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.