1. Calculating and projecting CLV
Great progress! We will now learn how to calculate customer lifetime value with these three methods using Python!
2. The goal of CLV
Let's reiterate first on the goals of customer lifetime value.
With CLV we want to measure customer value in terms of revenue or profit.
This way we benchmark the customers and are able to assess the maximum amount of money the company can afford to spend in acquiring new customers, given the lifetime value they are expecting to earn from an average customer.
In our example dataset, we don't have the profit margin and we won't assume an artificial margin. For the sake of simplicity we will skip the profit margin from the calculation and will calculate revenue-based CLV.
Here is the traditional CLV formula update to revenue-based form. We will use the revenue-based methodology in all three methods.
3. Basic CLV calculation
Great, let's start with the basic customer lifetime calculation.
First, we calculate the revenue spend for each customer monthly. We group by the CustomerID and the InvoiceMonth, and then sum up the revenue stored in the TotalSum column, and then calculate the overall average.
Afterwards, we define the customer lifespan. This is a broad topic that could take a full course on its own, but ultimately this depends on the business model, customer lifetime expectation, and other data points. This can be inferred by looking into the average time it takes customers to churn from the time they made their first purchase. For now, we will assume that the customer lifespan is 36 months, or 3 years.
Finally, we calculate the basic CLV by multiplying the monthly average revenue and the lifespan.
After printing the result, we can see that the average basic CLV is 4774 dollars.
4. Granular CLV calculation
Now, we will look into more granular transaction or invoice level data points to calculate the granular customer lifetime value.
First, we will calculate average revenue per purchase. We will group on the InvoiceNumber which is a unique purchase, and then calculate the average. As you can see we have called the mean function twice. This is not a mistake. First function call averages the revenue per invoice, and we will have multiple datapoints for each invoice. The second time we call the mean function, we will get a one number that's the overall revenue per purchase average.
Next, we calculate the average number of unique invoices per customer each month. We do that by grouping on the CustomerID and the InvoiceMonth, and using the nunique function to count unique number of invoices. Then we add the mean function to average these values to one overall number.
Then, we set the lifespan as with the previous example, and then multiply the three values to get the granular CLV, and print it out with some other metrics.
We can see that the granular CLV is lower than the basic one at around 1635 dollars. This is a more conservative way to calculate CLV.
Let's jump into the traditional CLV calculation where we will get an even smaller CLV estimate.
5. Traditional CLV calculation
Alright. Now we will calculate the customer lifetime value with the traditional method which does not require lifespan to be defined, and instead uses retention to churn rate to assess customer life expectancy.
We calculate the monthly revenue as we did with the basic CLV.
Then we calculate the retention rate from the monthly cohort retention dataset. Here, we exclude the first column, since the retention there is 100% given that in the first month every cohort is 100% active by definition. Then, we calculate the average monthly, and call the mean function the second time to get the overall number.
Afterwards, we calculate the churn rate which is just 1 minus retention.
Finally, we multiply the average monthly revenue with the retention to churn rate, and get the traditional CLV value. Let's print it out together with the inputs to assess it.
We can see that the traditional CLV is significantly lower than the other two measures. The root cause is that previously we used a pre-defined customer lifespan, and here the customer life expectation is inferred from the retention to churn ratio. The retention is very low, therefore the multiplier is less than 1. Typically, retention numbers are higher, somewhere around 80-90% which would roughly make this CLV value between 500 and 1200 dollars respectively. This model assumes that the churn is final, i.e. customers who don't come back the next month, are not coming back in the later periods. We won't explore the retention definition here, but you can test different time periods like quarterly or even annual retention for this and other datasets to assess impact on the retention and churn values.
6. Which method to use?
Now, these are just a few models on top of other more statistical approaches that we're not covering in this course. The choice of the formula depends on the business type and the main goal.
One thing to assess with the traditional CLV model is that the churn is assumed to be definitive here - i.e. the customer is expected to not come back if they have churned once. This assumption must be validated prior to using this approach.
As you've seen in the calculation, the model is not robust at low retention values as the reported customer lifetime values will be too low, even lower than the average monthly revenue spend.
Overall, that hardest thing to predict when approaching lifetime value calculation is the frequency of purchases in the future. In the next lesson we will learn how to do that using regression models.
7. Let's calculate customer lifetimes values!
Now, let's test your knowledge in calculating customer lifetime value!