Get startedGet started for free

Item combinations

1. Item combinations

Welcome back! In the last lesson, we discussed the concept of baskets and items. In this lesson, we will discuss how many distinct baskets can be created from a given set of items.

2. Back to the grocery store

Let's go back to the grocery store. Recall that you previously chose from the store one piece of bread and three pieces of cheese. However, in Market Basket Analysis, the interest is not on the "how much" but on the "what" has been purchased. Of interest to us is the fact that both items bread and cheese have been bought in combination.

3. Subsets and supersets

Let's understand this further by first explaining sets, subsets, and supersets. Imagine you could only buy wine, cheese, bread, and butter from the store. The set X of items in our store would be all the unique items present in it (wine, cheese, bread and butter!). Realistically, a store has many more items, so Market Basket Analysis is often the study of smaller sets that are contained in the item set. This is called a subset. For example, if one buys bread and wine, this is a subset of X of size 2. The empty set is by definition as well a subset of the set X - you can imagine this set to represent a situation where you go to the store but do not actually buy anything. Conversely, a superset is a set containing all elements of a smaller set. For instance, the set Bread, Butter is a superset of the set Bread as the latter is contained in the former.

4. Itemset graph

Can we display all subsets from a given set of items? Yes you can, with Itemset graph. As an example with items A,B,C,D, the itemset presents a tree structure containing all subsets organised by subset sizes. Imagine how big the itemset graph is for a set containing thousands of items.

5. Intersections and unions

Let's look at typical operations from set theory. The intersection of two sets includes all subsets that are common to both sets. For instance, Butter is common between the sets Bread, Butter and Butter Wine. The union of two sets includes all elements from both sets. For example, the set containing bread and butter is the union of both individual sets Bread and Butter. In R, using the "dplyr" package, it is easy to use the "intersect" and "union" functions.

6. How many baskets of size k?

Now that you are familiar with sets and subsets, let's try counting the number of possible baskets. How many possible subsets of size k can be created from a set of size n. The answer is "n choose k", also known as the binomial coefficient. It is the ratio of n factorial and (n-k) factorial times k factorial. For instance, suppose you are at the store which has a total of 4 items, how many possible baskets of size 2 can be created? The answer is "4 choose 2" which equals 6. There are therefore 6 possible pairs of products from the set of 4 items. Can you figure out which one they are? Do not worry about the math here, R will make it easier for you to count.

7. How many possible baskets?

We now want to figure out how many baskets can be created in total. This involves summing up all baskets of size 1, 2, and up until n. For a fixed value of n, if we sum all binomial coefficients for k ranging from 0 to n, we obtain 2 to the power of n following Newton's binom formula. For instance, the total number of baskets that can be created from a set of 4 items is 2 to the power of 4, 16.

8. How many baskets in R?

In R, you can use the "choose" function to compute "n choose k". With 4 items and a basket size of 2, we obtain 6 as before. To obtain values for all basket sizes, we loop through all possible values of k. Note that the matrix used to store results has 5 rows instead of 4 to include the empty set in the computation. On the right hand size, we get the corresponding values.

9. Plotting number of combinations

Last, let us plot the number of subsets as a function of the subset size. We first define the "choose" function that will be plotted. We then define an empty dataframe and use the "stat_function" to plot our function. As you can notice, the plot is symmetric with respect to n divided by 2, here 25.

10. Are you ready to count?

Now it is your turn to actually put your skills in practice.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.