Get startedGet started for free

Congratulations!

1. Congratulations!

Congratulations! You've now completed this course on market basket analysis in Python. You're ready to take the skills you've acquired and apply them to real world market basket analysis problems. Let's review what we learned.

2. Transactions and itemsets

We started with an overview of transactions and itemsets, making use of a dataset from a small grocery store. A transaction is a list of unique items that were purchased together. In some cases, we used user libraries as transaction-like objects. An itemset is simply a collection of items and is often the object of study in market basket analysis.

3. Association rules and metrics

We next discussed association rules and metrics. Association rules use an if-then structure with antecedents on the left hand side and consequents on the right. There are many association rules, which necessitates the use of metrics and algorithms to perform pruning. We examined a broad variety of metrics, which can be used to evaluate the strength of associations and the frequency of itemsets. We showed how metrics could be used to prune itemsets.

4. Pruning and aggregation

We next considered two strategies for dealing with the large number of association rules. One strategy was to prune the rules, removing itemsets or rules that perform poorly with respect to certain metrics. Another strategy was to use aggregation, which condensed items into categories, reducing the number of possible associations in the dataset.

5. The Apriori algorithm

We also learned how to apply the Apriori algorithm, which identified frequent itemsets using the Apriori principle. We learned that the Apriori algorithm was necessary because it is often not even possible to compute support for all possible itemsets, given time and computational constraints. The Apriori principle states that subsets of frequent itemsets are also frequent. Thus, we can begin by computing support for single itemsets and eliminating all supersets that contain an infrequent item. We can then repeat the process for itemsets with two items and so forth. Using this process, we are able to eliminate a large number of itemsets without even computing their support values.

6. Visualizing rules

In the final chapter, we learned how to visualize rules. We saw that heatmaps can be used when the number of items of interest is small, but the number of associations between those items is high. Heatmaps can help us to visually identify items of interest and patterns in the data that aren't immediately obvious from pruning. We next discussed how to create scatterplots, which can be used to visualize the the relationship between two or more metrics in itemsets or association rules. We also saw that scatterplots can be modified to include a third metric. In general, we found that they were best used to guide the pruning process when there was a large number of rules or itemsets. Finally, we considered how to construct parallel coordinates plots for association rules with one antecedent and one consequent. We saw that parallel coordinates plots were similar to heatmaps, but exclusively visualized rules and not metrics. They also reduced visual clutter by eliminating the requirement to examine all pairwise relationships.

7. Congratulations!

Congratulations! You've now completed this course on market basket analysis in Python and are ready to begin solving real world problems.