Metrics in market basket analysis

1. Metrics in market basket analysis

Before being able to extract rules from the transactional dataset, let's go through the main metrics used in Market Basket Analysis.

2. Metrics used for rule extraction

Let's use the previous transactional dataset. Our goal is to extract association rules from these transactions. An example of a rule could be: "Bread" implies "Butter" meaning that if customers purchase "Bread", then they also purchase "Butter". "Bread" is known as the antecedent of the rule and "Butter" its consequent. Another example of an association rule is that if both "butter" and "cheese" are purchased together, then "wine" is purchased as well. In order to quantify the degree of uncertainty or confidence of a rule, we first need to introduce some metrics.

3. Support measure

The first metric is Support. It represents the popularity of an itemset. The support of an itemset X, supp(X), is the proportion of transactions in which the item X appears. For instance, "Bread" appears in 3 transactions out of 7 baskets, meaning that "Bread" is present in 42% of the baskets. Likewise, both Bread and Butter appear together in 42% of all baskets.

4. Confidence measure

The confidence is defined as the support of the union of X and Y divided by the support of X. It shows the percentage in which Y is bought with X. It’s an indication of how often the rule has been found to be true. For instance, the confidence of the rule "Bread" implies "Butter" is 100%, meaning that for all baskets containing "Butter", "Bread" is also present.

5. Lift measure

The lift represents the likelihood of the itemset Y being purchased when item X is purchased while taking into account the popularity of Y. It is defined as the ratio of the observed support to that expected if X and Y were independent. If the value of lift is greater than 1, it means that the itemset Y is likely to be bought with itemset X, while a value less than 1 implies that itemset Y is unlikely to be bought if the itemset X is bought. For instance, the lift of "Bread" implies "Butter" is 1.16, meaning that Bread and Butter occur together 1.16 times more than random.

6. The apriori function for frequent itemsets

The apriori function from the arules package allows to compute these metrics. We will be looking more deeply at this function in the next lesson, where we will use it to extract rules. There are three main arguments to the apriori function, first the transactional dataset, the second is the parameter list. The target keyword specifies whether we want frequent itemsets or rules to be extracted. We can specify a minimum support and confidence thresholds below which support and rules of itemsets are not considered, in this example 0.2 and 0.4 respectively. As we will see in more details later on, this is related to the large amount of possible rules. The minimum length rule can be set with the "minlen" argument, for instance 2. The last argument of the function is the "appearance", it is a list that allows to specify specific items to be included either for the support or to any side of an association rule.

7. The apriori function for rules

Likewise, we can generate association rules. The only difference here is that we set "rules" as target and also adjust the "apperance" parameter. In the previous slide we retrieved the support of "Cheese and Wine". In this case, we want "Butter" on the right-hand side of rules by setting the rhs argument to "butter".

8. Frequent itemsets with the apriori

To retrieve all itemsets with a support higher than 3/7, we adapt the parameter list. To obtain the most frequent itemsets, we use the sort function and specify support as the metric to sort on. We then inspect the first few records. "Butter" and "Wine" are the most popular items. Does it match with our transactional table?

9. Inspect confidence and lift measures

Using the apriori function, let's find metrics for rules containing the item "Butter" on the right hand side. In this case, we need to change the target to "rules" rather than "frequent itemsets". We further sort the rules according to the lift.

10. Inspect confidence and lift measures

The result is a list of rules containing all metrics seen previously with "Butter" on the right hand side. Note that four rules (all containing "Bread") have the highest lift and the same confidence.

11. Let's practice!

Now it's your turn to compute support, lift and confidence metrics on the Online Retail dataset.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.