Building functions to automate analysis
1. Building functions to automate analysis
In the previous chapter, you may have noticed that you were doing a lot of similar, repetitive tasks. Anytime you notice repetition in your work, you should think about how you can automate it.2. Why build a function?
One of the best ways to make this kind of analysis faster is to create functions. For example, remember when you analyzed the quality of subscribers by day and conversion channel in the last chapter? You might want to conduct this kind of analysis repeatedly for different subsegments of the customer base.3. Print daily_retention_rate
Rather than copy-paste the code snippet to make the appropriate edits, which can lead to typos and make it difficult to correct bugs as they arise, it is better to write a function.4. Building a retention function
So we define a function, retention_rate() that allows users to input a DataFrame and a list of column names. This function follows the same steps to calculate the total number of users who converted and retained. First, it calculates the total number of retained users, then the total number of subscribers, and finally divides them to obtain the retention rate. However, this time in the groupby() method, we include the user-inputted column names.5. Retention rate by channel
Now that we’ve defined the function, all we need to do to reproduce the retention rates from the previous chapter is call retention_rate() with the marketing DataFrame, and pass the date_subscribed and subscribing_channel columns as a list of strings. After unstacking, we have the same results as before.6. Plotting daily retention by channel
Next, we follow the same steps as before to plot our results and...7. Messy daily retention rate chart
Here's the resulting plot with much less effort than in the previous chapter due to our function! However, as you can see, this is a crowded chart that is nearly impossible to read. I recommend looking at channels one-by-one to identify trends.8. Plotting function
Again, instead of writing similar code over and over to plot the columns one at a time, we will create a function: this function will create several plots, one for each column. The function uses a for loop to go through each column in the DataFrame and plot each column individually. Note that here we are using matplotlib's plot() function to keep things simple. Since the dates are still located in the DataFrame's index, we use the index attribute to display dates on the x-axis, and the column values go on the y-axis. We can now call this function on the same daily_channel_retention DataFrame.9. Email plot
Which will then display a plot for each column in the DataFrame -- in this case a plot for each channel by date. This is the plot for email. You can see email has big spikes that often go down to 0. This is common because emails are typically sent in bulk leading users to subscribe on the same set of limited days. When retention rate is 0, this means no one subscribed on those days.10. Let's practice!
Now it's your turn to practice building your own functions.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.