Get startedGet started for free

Building functions to automate analysis

1. Building functions to automate analysis

In the previous chapter, you may have noticed that you were doing a lot of similar, repetitive tasks. Anytime you notice repetition in your work, you should think about how you can automate it.

2. Why build a function?

One of the best ways to make this kind of analysis faster is to create functions. For example, remember when you analyzed the quality of subscribers by day and conversion channel in the last chapter? You might want to conduct this kind of analysis repeatedly for different subsegments of the customer base.

3. Print daily_retention_rate

Rather than copy-paste the code snippet to make the appropriate edits, which can lead to typos and make it difficult to correct bugs as they arise, it is better to write a function.

4. Building a retention function

So we define a function, retention_rate() that allows users to input a DataFrame and a list of column names. This function follows the same steps to calculate the total number of users who converted and retained. First, it calculates the total number of retained users, then the total number of subscribers, and finally divides them to obtain the retention rate. However, this time in the groupby() method, we include the user-inputted column names.

5. Retention rate by channel

Now that we’ve defined the function, all we need to do to reproduce the retention rates from the previous chapter is call retention_rate() with the marketing DataFrame, and pass the date_subscribed and subscribing_channel columns as a list of strings. After unstacking, we have the same results as before.

6. Plotting daily retention by channel

Next, we follow the same steps as before to plot our results and...

7. Messy daily retention rate chart

Here's the resulting plot with much less effort than in the previous chapter due to our function! However, as you can see, this is a crowded chart that is nearly impossible to read. I recommend looking at channels one-by-one to identify trends.

8. Plotting function

Again, instead of writing similar code over and over to plot the columns one at a time, we will create a function: this function will create several plots, one for each column. The function uses a for loop to go through each column in the DataFrame and plot each column individually. Note that here we are using matplotlib's plot() function to keep things simple. Since the dates are still located in the DataFrame's index, we use the index attribute to display dates on the x-axis, and the column values go on the y-axis. We can now call this function on the same daily_channel_retention DataFrame.

9. Email plot

Which will then display a plot for each column in the DataFrame -- in this case a plot for each channel by date. This is the plot for email. You can see email has big spikes that often go down to 0. This is common because emails are typically sent in bulk leading users to subscribe on the same set of limited days. When retention rate is 0, this means no one subscribed on those days.

10. Let's practice!

Now it's your turn to practice building your own functions.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.