1. Recency, frequency, monetary (RFM) segmentation
Congratulations, you're now equipped to group customers into cohorts and analyze their behavior over time. In this chapter, we will dive into a very popular technique called RFM segmentation, which stands for Recency, Frequency and Monetary value segmentation.
2. What is RFM segmentation?
To do this, we are going to calculate three customer behavior metrics -
Recency - which measures how recent was each customer's last purchase,
Frequency - which measures how many purchases the customer has done in the last 12 months,
And MonetaryValue - measures how much has the customer spent in the last 12 months.
We will use these values to assign customers to RFM segments.
3. Grouping RFM values
Once we have calculated these numbers, the next step is to group them into some sort of categorization such as high, medium and low. There are multiple ways to do that.
We can break customers into groups of equal size based on percentile values of each metric
We can assign either high or low value to each metric based on a 80/20% Pareto split
Or we can use existing knowledge from previous business insights about certain threshold values for each metric
In the next section you will learn how to assign a percentile to a metric, and then create a label to be used for segmentation.
4. Short review of percentiles
The process of calculating percentiles is fairly simple:
First, you sort the customers based on that metric
Then, you break the customers into a number of groups that you think is relevant. The groups are equal in size
Finally, you assign a label to each group.
Luckily, in pandas we already have a function built in for calculating percentiles called qcut().
5. Calculate percentiles with Python
To understand the concepts behind percentile calculations - we have created a simple dataset with 8 CustomerIDs and random Spend values representing their total spend with the company.
6. Calculate percentiles with Python
We will now assign a quartile value to each of these customers.
First, we will use the qcut() function on the Spend variable, and define that we want 4 groups of equal sizes - called quartiles. We will also pass a range() function to the labels argument so our groups have integer names, with highest value quartile labeled as 4, and lowest as 1.
Next, we add a column to our dataframe.
And then we print it after sorting by the quartile value.
7. Assigning labels
When assigning labels we want them to represent what is the top and the bottom percentile based on sorted values, but the highest value of the metric is not always the best. For example, the recency metric which calculates days since the last purchase, is better when it's low rather than high.
For this example, we have created a sample dataset with 8 CustomerIDs and their Recency in days.
8. Assigning labels
Let's create a list of labels - only this time the values are reversed as lower recency is rated higher.
We will use the qcut() function on the Recency variable, and define that we want 4 groups of equal size. We will pass the list of labels we created above.
Next, we add a column to our dataset.
And then print it after sorting by the recency_days value.
9. Assigning labels
As you can see, the lower the recency, the higher the quartile value. When assigning labels, you should always think whether higher or lower values should be of a higher rank.
10. Custom labels
We can also create custom named labels.
First, we create named labels as strings in a descending order. We use descending order because we are ranking Recency metric.
Then we run everything like previously and get a new Recency label based on the previously defined values.
11. Custom labels
Although this is a small sample, it does show the main concepts of how to use percentiles to group customers based on their usage behavior.
12. Let's practice with percentiles!
Now, you will practice with assigning percentiles to different values, and create custom labels to them.