Customer lifetime value in CRM

1. Customer lifetime value in CRM

Welcome to this course on Marketing Analytics in R and Statistical Modeling. My name is Verena Pflieger, I am Data Scientist at the consultancy INWT Statistics in Berlin, Germany.

2. INWT Statistics GmbH

INWT is a company that specializes in data science in the fields of online marketing, customer relation management, and business intelligence. In this course, I will introduce you to statistical methods applied to the field of marketing analytics. First, we will model customer lifetime value using linear regression. Then, we will model customer churn using logistic regression. Additionally, we will use survival analysis to predict the time until a person orders, and finally, we will use principal component analysis in order to handle high dimensional marketing data.

3. Customer lifetime value (CLV)

The customer lifetime value, called CLV, describes the predicted future net-profit accumulated by a company through its relationship with a customer. Since the CLV is a forecast, there are several challenges concerning its estimation. Once estimated, we are capable of identifying customers that are likely to generate higher net-profits. Practically, this can help us to target or prioritize customers according to the future profits

4. Predicting the Margin of Year 2

Net-profit, also known as margin, is the metric of interest. For this reason, we want to find drivers affecting the magnitude of the margin. However, there is one tricky aspect about this. We want to predict the future margin using only data that is available at the time, not data that will be available in the future. Hence, we need a model that uses current information in order to predict the future margin. Therefore we apply a two-step procedure. First, we take the explanatory variables from year one and use them to predict the dependent variable in year two. This is the model specification step done on the dataset called `clvData1`. When the model is specified, we move on to the next step.

5. Predicting the Future Margin

Then, we take the explanatory variables from year two and make predictions for the future margin at year three, the period which we do not have any information about. Information about year two is stored in the dataset `clvData2`. Variables used for prediction need to have the same names in both datasets.

6. CLV Data

Let's take a look at a typical CLV dataset. The dataset `clvData1` holds 13 aggregated metrics on the ordering behavior of 4200 customers for a certain year. Additionally, the margin generated in the following year is included. We are looking at the structure of the dataset using the structure command from the *utils* package, which is automatically loaded. The data contains information like `gender`, `age`, the number of orders per year, and the customer's yearly margin. Additionaly, we find order-specific metrics like the number of items per order, a returned goods ratio, and so on.

7. Correlations

As a first measure of the relationships in our data, we will look at correlations. To do this, we take all the information from year 1, and correlate it with the future margin from year 2. We can visualize the correlations calculated by the `cor` function in the *stats* package using the `corrplot` function from the *corrplot* package. Notice the strong positive correlations plotted in blue between the number of orders, the number of items, and the share of own brand and the future margin. Between the margin of the current year and the future margin we observe a somewhat stronger positive correlation. Conversely, the days since the last order and the return ratio are moderately negatively correlated with the future margin - plotted in orange.

8. Let's practice!

Time to put this into practice.