1. What is market basket analysis ?
Now that we know what a basket is, time to introduce the rationale behind Market Basket Analysis.
2. Multiple baskets @ grocery store
Back to our favorite grocery store. Recall that we chose Bread and Butter from the store. Our friend goes as well to the store and purchases Bread, Wine and Cheese. Both of us have chosen Bread and Cheese in combination. Is this a coincidence? Hard to say based only on two baskets. Of course, we will not be looking only at individual baskets but rather multiple baskets- this is where it becomes interesting.
Imagine, 100 customers visit the store, we therefore have 100 baskets. If 80 baskets contain both Bread and Cheese, you may infer that there is some association between both items.
Market Basket Analysis extracts some insightful information by finding associations of items that occur together.
The outcome of this type of technique is, in simple terms, a set of rules that can be understood as “if this, then that”.
3. Market basket applications
By looking at multiple baskets, we will infer some interesting relationships between items.
The usage of Market Basket Analysis can be seen in every domain. For example, on an eCommerce website you see “customers who bought this also bought this” or “frequently bought together”.
Similarly in the retail world, you will see examples of items which are “bundled or placed together” based on what is frequently bought together.
On social media you would have seen “friends and connections recommendation”. How about Netflix’s and Youtube’s “videos and movies recommendation”?
All of these can be achieved using Market Basket Analysis.
Albeit there are other methods that can integrate more information than Market Basket Analysis, all of these can be achieved by using Market Basket Analysis.
4. Multiple baskets in R
Back to R. Let's create a dataset with multiple baskets.
The "Basket" column contains the ID of the basket while the "Product" column contains the item which has been purchased.
7 baskets are created with various basket sizes. The first few lines of the dataset show us basket 1, a basket containing one piece of bread and 3 pieces of cheese.
5. What's in our baskets?
Now let's have a closer look at our baskets.
We can figure out the number of distinct items as well as the number of baskets using the "n_distinct" function. In this case, we have 4 distinct items and 7 baskets.
Using the "group_by" and "summarise" functions, we can find for each basket its size as well as the number of distinct items. Basket 1 has 4 items, among which 2 distinct ones.
6. How big are baskets?
To get an idea of the size of baskets, we can compute the average total number of items and the average number of distinct products in each basket. In this case, customers bought on average 2.5 products and 1.8 distinct ones.
We can additionally plot the distribution of the number of distinct items with ggplot - with "n_items" in this case or with "n_total" for the distribution of the total number of items.
7. Specific products in the baskets
Sometimes, we may want to look at a specific product or at a combination of products. You wonder how many times an item appears across all baskets or how many baskets actually contain that item at least once.
In R, use the function "filter" to select the specific item you want in combination with the functions used before. In this example, out of the 7 baskets, Cheese appears 5 times in total, but only in 3 of the baskets. Do you remember that in Basket 1, we took 3 pieces of Cheese?
8. Association rule mining
Instead of only looking at cheese, imagine looking at all possible combinations of items! Market basket analysis is more generally known as association rule mining. It's purpose is to find frequent co-occuring associations among items, like for instance Bread and Butter. This will enable to further infer on the relationship between the different items and extract rules, known as association rules. For instance, it could be a rule such as Bread implies Butter, or Bread and Cheese implies Wine.
In the next chapter, we will learn more about these rules and how they are actually extracted from the data.
9. So what's coming next?
What is coming next?
In chapter 2, we introduce the metrics and techniques used to retrieve the set of rules.
Chapter 3 is devoted to visualizing the output of Market Basket Analysis.
Finally, we will apply all our learned skills to build movie recommendations on the movielens dataset.
10. Let's play with baskets!
Now it's your turn to play with baskets from the Online retail dataset.