The apriori algorithm
1. The apriori algorithm
Time to dive into the backbone of Market Basket Analysis with one of the most popular algorithm which is the Apriori algorithm.2. Association rule mining
Market basket analysis is more generally known as Association rule mining. It allows to discover relationships between items in a large transactional dataset. To extract interesting rules from the transactional dataset, the association rule mining can be decomposed into two subtasks. First, the set of frequent items for which the minimum support level is satisfied is generated. Second, rules with high confidence are extracted from this set of frequent itemsets. The apriori algorithm is a fast mining algorithm that is part of the association rule mining algorithms.3. Idea behind the apriori algorithm
The Apriori algorithm uses a bottom-up approach, where frequent subsets are extended one item at a time. This step is known as candidate generation, and groups of candidates are tested again the dataset. The algorithm makes use of the apriori principle to generate candidate itemsets efficiently: - If an itemset such as {A,B} is frequent, then both subsets {A} and {B} are frequent. - Likewise, if an itemset such as {A} is infrequent, then all its super-sets such as {A,B}, {A,C} and {A,B,C} are infrequent.4. Example: 1-itemset
To illustrate how the apriori algorithm works, imagine the following transactional dataset and the corresponding itemset graph. Suppose that the minimum support threshold is 3/7. The algorithm starts by counting the number of occurences of A; in this case it appears 3 times. Similarly, by scanning the dataset, items B,C and D are flagged as frequent items given they all satisfy the minimum support threshold. These frequent 1-itemsets are used to generate candidate 2-itemsets.5. Example: 2-itemsets
For the 2-itemsets, neither itemsets (AC) nor (AD) satisfy the minimum support threshold. This means from the apriori principle that all their supersets are neither frequent itemsets.6. Example: 3-itemsets
Both infrequent itemsets (AC) and (AD) and their respective super-sets are colored in orange on the graph. The exploration space has therefore been substantially reduced. The remaining 3-itemset candidate, the itemset {BCD}, is infrequent as its support is 2.7. Example: frequent itemsets
Finally, all frequent itemsets are shown in blue on the graph and displayed in the table.8. Apriori: rule generation
After the expensive step of retrieving frequent itemsets, the apriori algorithm generates associated rules. First, the algorithm fetches the high-confidence rules with a single precedent. Afterwards, it searches for more complex rules, with more than one item to the right hand side of the rule. The trick used by the apriori is to prune some of the association rules. For instance, if the rule (BCD) implies A has low confidence, then all rules with item A in its consequent are discarded.9. A first try with the apriori
To retrieve the list of frequent itemsets in R, we use the "apriori" function. Arguments are the transactional dataset and the parameter containing: 1. the minimum support threshold denoted as "supp" and 2. the "target" as frequent itemsets.10. Output of the apriori - frequent itemsets
By inspecting the object, we obtain a dataframe containing the set of frequent itemsets which matches with the itemset graph we built earlier.11. Extracting rules with the apriori function
To generate association rules and steer the output of extracted rules, different arguments can be used. The parameter argument contains values such as support, confidence threshold and minimum length of rules. The "control argument" allows to influence the performance of the algorithm. "Appearance" allows to specify constraints on the right or left hand side of the generated rules.12. Extracting rules: output
We inspect the object to extract the generated rules and their corresponding measures. For instance, the first rule states that "A implies B" for a support of 42% and a confidence of 100%.13. Let's practice!
Now it is your turn to practice the skills you have learned on the Online Retail dataset. Happy shopping!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.