1. Understanding and visualizing trends in customer data
Awesome work, examining plots in this way can be a tremendous way to understand trends in user behavior!
2. Further techniques for uncovering trends
Often, plotting a graph is not enough and additional preprocessing is required to uncover the trend. Here we will consider two of these processing techniques and explore how they are useful.
3. Subscribers Per Day
To start, let’s look at the USA subscribers-per-day number for a new product without a trial period. Here we have a dataset of subscribers broken out by registration and subscription date.
For a given day we will look at the number of subscribers who registered within the prior week. To do this we find the difference between registration and subscription dates. Then filter on this value, and finally group and aggregate by subscription date.
4. Weekly seasonality and our pricing change
There seems to be weekly seasonality. As evidenced by the peaks-and-valleys roughly every 7 days. Perhaps users are more likely to buy when they are using the app more heavily on the weekends than during the middle of the week
Many metrics will have seasonality and it can easily obfuscate macro-level-trends in the data.
Specifically, we have reason to be concerned that a recent pricing change is causing a dip in subscription volume. It might be hard to tell if the data has a high degree of seasonal movement as we see here.
5. Correcting for seasonality with trailing averages
We can correct for this movement, by calculating a trailing average over our data. A trailing average is a smoothing technique that sets the value for a given day as the average over the past n-days. To smooth weekly seasonality we want n-equals-seven. This has the effect of averaging over a week such that every day is pulled towards the weekly level limiting the day level effects.
6. Calculating Trailing Averages
First, we use the pandas `rolling()` method to find the rolling window.
The primary parameters of `rolling()` are `window` which is our n value from above, and `center` which is a Boolean.
If this is true then our averaged value will be placed at the middle of our window, if not it will be placed in the index that we are looking back from, which is what we want.
7. Smoothing our USA subscription data
Once we have this we call `mean` to find our average over the window. We can calculate this and store it in an additional variable
As we can see, this smoothing does a lot to flatten out our line and reveal the dip, unobscured.
8. Noisy data - Highest SKU purchases by date
Beyond seasonality, data can simply be noisy. Let’s take a look at a graph of how many of our largest sku in-app-items are purchased per day.
It is incredibly noisy, as the values vary widely from day-to-day. We can apply an exponential moving average so we can check if any macro trends are hidden among this noise.
9. Smoothing with an exponential moving average
This type of average weights the points such that the earlier ones are weighted less than the more recent ones within our window. This pulls our data back to any central trend, while maintaining any recent movements.
10. Smoothed purchases by date
We can use the `ewm()` method to find these weighted windows. To do this we specify the `span` argument to be our window size. Then we find the average of this weighted data.
Here we apply this to our set of purchase data, using a 30-day window. Determining windows like this can require prior knowledge of the structure of the data or some trial-and-error.
See that this does remove a lot of the noise and reveals a slight upward trend.
11. Summary - Data Smoothing Techniques
These techniques are very useful for uncovering trends. Next we will put them to use diving deeper into user data.
12. Let's practice!
Good luck, now let’s practice!