Resolving inconsistencies

1. Resolving inconsistencies

In the previous exercise, you identified a dip in conversion rates for House Ads. It appears that the problem was that users were seeing ads in languages other than there preferred language. In this lesson, we'll assess the impact of this mistake.

2. Assessing impact

While you cannot ignore data related to errors in the campaign, you can estimate what conversion might have looked like if there had been no issues. One way to assess impact is to index all other languages' conversion rates to English during the period where the ads were running in the correct language for each user. We begin by slicing the house_ads DataFrame to include the rows where the date_served is prior to when the language bug arose. Using our conversion_rate() function, we calculate conversion rate for each language during that period.

3. Assessing impact

We then divide the conversion rate of all other languages by the conversion rate of English in order to understand the relative relationship of how well our marketing assets typically convert users for each language compared to English.

4. Interpreting Indexes

What these indexes mean is that Spanish-speaking users typically convert 1.7 times the rate of English-speakers and Arabic and German speakers convert at about 4-5 times the rate compared to English-speakers.

5. Daily conversion

Next, we calculate the total number of users and actual conversions on each day. First, we group the DataFrame by date_served and language_preferred. Next, we do something different. We use the agg() method since we wish to calculate multiple statistics. We pass a dictionary to this method where the key is the column name, and the value is the method we want to apply on the column. Thus, we calculate the total number of unique user ids and the total number of users who converted.

6. Daily conversion

Finally, we unstack our result with level equals one to make it easier to manipulate in future steps. The result is a DataFrame with the number of users who should have seen ads in each language and how many of those users converted each day.

7. Create English conversion rate column

Since the conversion_rate() function puts the date_served in the DataFrame's index, we can use the loc accessor to slice our DataFrame and retrieve columns only from the period where the language bug was a problem. Our DataFrame has multi-level column names, so we can access the total number of people who converted for the English language by putting the two names in parentheses as a set, first writing "converted" because that is the first level of the column structure and then the relevant language. In this case, converted comma English.

8. Calculating daily expected conversion rate

Next, we can multiply the actual English conversion rate during this time by the language indexes we created earlier to determine what the expected conversion rates for these languages would have been for each day.

9. Calculating daily expected conversions

Then, we multiply the daily expected conversion rate of each language by the number of users who should have seen ads in that language. This gives us how many subscribers we would have expected if the language bug had not occurred.

10. Determining the number of lost subscribers

To calculate the overall impact, limit the expected conversion dataset to only the days when the bug occurred. Next, sum the number of expected and actual subscribers during that period, individually. Finally, we take the difference in the number of subscribers we expected and the subscribers we received, which gives us an estimate of how many subscribers we lost due to the language error.

11. Let's practice!

Now it's your turn. Let's assess the impact of this bug!