Cohort analysis
1. Cohort analysis
Now we will learn about the most popular cohort analysis type - time cohorts. We will segment customers into acquisition cohorts based on the month they made their first purchase. We will then assign the cohort index to each purchase of the customer. It will represent the number of months since the first transaction.2. Cohort analysis heatmap
Time based cohorts group customers by the time they completed their first activity. In this lesson, we will group customers into cohorts based on the month of their first purchase. Then we will mark each transaction based on its relative time period since the first purchase. In this example, we will calculate the number of months since the acquisition. In the next step we will calculate metrics like retention or average spend value, and build this heatmap.3. Cohort analysis heatmap
For example, this number means that 24% of the cohort which signed up in August 2011, were active 4 months later. Column one here is the month of first purchase, therefore the retention rate is 100%. This is by definition, as customers had to be active on this month to be assigned to this cohort.4. Online retail data
A little bit about data. We will use a 20% random sample from an Online retail dataset with half a million transactions. This is a realistic dataset with customer transactions which is commonly used in segmentation. Let's look at the first 5 rows of it.5. Top 5 rows of data
The data contains 7 columns with the customer transactions. The main ones we will use are: date, price, and the customerID. Now that we have loaded the data, let's build a simple cohort table for time-based cohorts.6. Assign acquisition month cohort
First, we create a function that truncates a given date object to a first day of the month. Then we apply it to the InvoiceDate and create an InvoiceMonth column. Next, we create a groupby() object with CustomerID, and use the InvoiceMonth column for the further manipulation. Finally, we use transform() together with a min() function to assign the smallest InvoiceMonth value to each customer. With just that, we have assigned the acquisition month cohort to each customer. Let's look at the data. We have added two columns - InvoiceMonth and CohortMonth. Now, let's calculate the time offset!7. Extract integer values from data
Before we can calculate the time offset, we will first create a helper function which will extract integer values of the year, month and day from a datetime() object.8. Assign time offset value
Now, we will calculate the number of months between any transaction and the first transaction for each customer. We will use the InvoiceMonth and CohortMonth values to do this. We will start by creating two object with year and month integer values from each of the InvoiceMonth and CohortMonth variables. Then we will calculate the differences in years and months between them. Finally, we will convert the total difference to months by multiplying the year difference by 12 and adding them together. You can see, there's a "+1" in the end. We do this so the first month is marked as 1 instead of 0 for easier interpretation. You can see that the new column is added. Now, let's pull some metrics!9. Count monthly active customers from each cohort
Now we will calculate the number of monthly active customers in each cohort. First, we will create a groupby object with CohortMonth and CohortIndex. Then, we will count number of customers in each group by applying pandas nunique() function. Then, we reset the index and create a pandas pivot with CohortMonth in the rows, CohortIndex in the columns, and CustomerID counts as values. Let's take a look at our table.10. Table with monthly active customers for each cohort
This is the result! We have created a table that will serve as the basis for the rest of this chapter.11. Your turn to build some cohorts!
In the next lesson we will learn how to calculate retention rate - it's very simple and is just few lines of code away! Now - it's your turn to build some cohorts!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.