Basic Apriori results pruning

1. Basic Apriori results pruning

Now that you know how to apply the Apriori algorithm, let's see how we can use it to perform pruning.

2. Apriori and association rules

Let's recall where Apriori left us. Apriori is applied to itemsets and tells us only about their frequency. Either the frequency is higher than a minimum threshold or not. Additionally, modified versions of Apriori, such as the one we applied from mlxtend, also allow for pruning by itemset length. But none of this tells us about association rules. In fact, there are many more association rules than itemsets. For instance, the aggregated itemset bags and boxes could be associated with the rule "if bags then boxes" or "if boxes then bags." This, of course, becomes a much more serious problem for larger itemsets.

3. How to compute association rules

In chapters 1 and 2, we computed association rules ourselves, but how can we do this for the output of the Apriori algorithm, even if it contains 3-item, 4-item, 5-item, and larger itemsets? And how can we ensure that this doesn't undo the pruning we did in the Apriori step, yielding an unmanageable number of rules? Again, mlxtend offers a simple means of enumerating and pruning association rules.

4. How to compute association rules

Let's work through an example. We'll start by importing both apriori and association rules from mlxtend. We'll then import the one-hot encoded novelty gift data and apply the Apriori algorithm. We now have a DataFrame of frequent itemsets according to our choice of support value. The next step is to apply association rules, passing to it the frequent itemsets DataFrame, a metric, and a minimum threshold. We've selected a support threshold of 0-point-0, which will apply no pruning.

5. The importance of pruning

Let's count the number of rules. Without applying any pruning, we get 79506. And how many frequent itemsets did we have after applying Apriori? Only 4708. The association rule generation stage dramatically increased the number of objects we have to consider, since we applied no pruning.

6. The importance of pruning

What if we compute the rules again, but use a non-zero threshold for support? Trying 0 point 001, we can see an enormous difference. Instead of returning 79506 rules, we instead generated only two.

7. Exploring the set of rules

Printing the column headers, we can see that the rules DataFrame contains the antecedents, consequents, antecedent support, consequent support, and a number of metrics. All of this is computed automatically whenever we apply the association rules routine. If we print the antecedents and consequents columns, we can see that the two rules identified consist of the same itemset. The first rule states "if JUMBO BAG RED RETROSPOT then birthday card and retro spot."

8. Pruning with other metrics

What if we'd prefer to prune with other metrics? Perhaps we'd rather use a metric like lift, which is computed automatically, to remove rules that perform poorly. Alternatively, let's say that we want to use support, but only care about the support of the antecedent item. We can set it, for instance, to use a threshold of 0-point-002. We can see that doing this yields a substantial reduction in rules from 79506 to 3899. More generally, association rules can apply a threshold for any of the standard metrics we discussed earlier, including lift, confidence, leverage, and conviction.

9. Let's practice!

You now understand how to compute and prune association rules, starting from the Apriori algorithm's frequent itemset output. Let's try that in some exercises.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Market Basket Analysis in Python

IntermediateSkill Level

4.9+

100 reviews