1. Advanced Apriori results pruning
Pruning Apriori algorithm results by standard metrics won't always yield useful association rules. When this happens, we'll need to apply advanced filtering techniques, making use of multiple metrics, custom metrics, aggregation, and other strategies.
2. Applications
In this video, we'll apply advanced filtering techniques to the novelty gift store dataset.
We'll consider two use cases: first, using cross-promotion to sell a targeted consequent; and second, using aggregation and Zhang's metric to select the layout for a new store.
3. The Apriori algorithm
For both use cases, we'll perform the same pipeline where we take in a list of lists, one-hot encode the itemsets, and then apply the Apriori algorithm to prune them.
4. The Apriori algorithm
Let's recall how we implement this as code. We'll first import pandas, numpy, TransactionEncoder from mlxtend dot preprocessing, and apriori from mlxtend dot frequent_patterns. We'll then import the data as a list of lists.
5. The Apriori algorithm
We next instantiate a Transaction Encoder and use it to transform the list of lists to one-hot encoded data. We then apply the Apriori algorithm.
6. Apriori algorithm results
We now have a list of frequent itemsets and their supports. Notice that Apriori pruned the list from 4201 itemsets to 2328. We can now recover the standard set of metrics by applying the association_rules() function we imported earlier.
7. Association rules
This yields 239 rules. Now, let's say we show this list to the gift store manager and she tells us to cross-promote the herb marker thyme item.
8. Filtering with multiple metrics
We can start by restricting the set of rules to those that have "herb marker thyme" as the consequent. We'll then apply a multi-metric filter that requires the antecedent's support to be greater than 1%, the rule's support to be greater than 0-point-009, the confidence to be greater than 0-point-85, and the lift to be greater than 1.
When we do multi-metric filtering, we'll typically arrive at the final rule by applying increasingly stringent filters to different metrics. In this case, we might start with support, find many rules that contain the same itemsets, and then add confidence to discriminate between different rules based on the same itemset. Finally, we'll add lift to narrow the set of rules further by getting rid of those that may have arisen by random chance.
Printing the results, we can see that basil, parsley, or rosemary herb markers could be used to cross-promote the thyme herb marker.
9. Grouping products
Let's look at another quick example of advanced filtering. The store we're advising is re-locating to a horseshoe-shaped building. The manager is trying to decide how to arrange the sections of the store and wants our input. She offers three options: group boxes and bags together, and signs and candles together. Group boxes and candles, and signs and bags. Or group boxes and signs, and candles and bags.
She tells us that our objective should be to keep dissociated items far apart. That means we need to use Zhang's metric.
10. Aggregation and dissociation
We'll start by loading the data, which has been aggregated into four categories: bags, boxes, candles, and signs. We'll then apply Zhang's rule, which we defined in Chapter 2. Recall that Zhang's rule provides a continuous measure of net association over the minus 1 to plus 1 interval.
11. Zhang's rule
Let's print the results and recall that our assignment was to avoid grouping dissociated items. We can quickly see that bags and candles, and signs and bags should not be paired.
12. Selecting a floorplan
Zhang's rule provides a continuous measure of association that ranges between minus 1 and plus 1. More positive values indicate stronger association. More negative values indicate stronger dissociation.
Applying Zhang's rule, we are able to eliminate two of the proposed floor plans, leaving us with the one where bags and boxes, and signs and candles are paired.
13. Let's practice!
It's now time to put what you've learned to use in some exercises.