Get startedGet started for free

Market basket introduction

1. Market basket introduction

Welcome to the course Market Basket Analysis in R. My name is Christopher Bruffaerts. I have worked in data science for many years in various sectors such as banking, telecommunications and energy. I often came across Market Basket Analysis, which is the course I will be teaching you.

2. Overview

First, we will introduce the concepts and intuitions behind Market Basket Analysis in Chapter 1. Then, in Chapter 2, we'll cover the main metrics used in Market Basket Analysis and introduce the apriori algorithm. Chapter 3 will be dedicated to visualizing and understanding the outcomes of our analysis. And finally, in Chapter 4, we will go beyond retail and apply market basket analysis to create movie recommendations by using MovieLens data.

3. What is a basket?

Let's start off with the basket. A basket is simply a collection of items. In the offline world, you can think of an item as a product from the supermarket, such as bread. In the online world, it could be a product on an E-commerce website, or a DataCamp course, or even the last movie you streamed. You can now imagine a basket to be your basket at the grocery store, your Amazon shopping cart, all the courses you've completed on Datacamp, and all the movies you've watched on Netflix.

4. Grocery store example

Let's take the grocery store example. Imagine the store you go to has only four items: bread, butter, cheese and wine. You decide to buy one piece of bread and three pieces of cheese, leaving you with a basket of 4 items in total with 2 distinct items.

5. Grocery store example in R

Now let's see how this looks in R. Let's pick randomly 4 items from the store. We first set the seed for reproducibility purposes. We create a dataframe containing two columns: first a transaction ID set to 1 to denote that these items are part of the same basket and secondly the chosen products. In this simulation run, we pick 1 piece of bread and 3 pieces of cheese. The focus of the Market Basket Analysis is rather on the which products than on the how much.

6. What's in my basket?

Now that you have chosen your products from the store, it is time to go to the checkout. Will your bill actually look like the dataframe on the left hand side of the slide? Not quite! rather the one on the right hand side containing the list of distinct items purchased with the corresponding quantity.

7. What's in my R basket?

In R, how do we obtain this new dataset containing a single row per item purchased? The "add_count" function counts the number of distinct products while the "unique" function allows to keep one product per row. With "n_distinct" we get the number of unique items purchased while the "summarise" and "sum" functions allow to obtain the total basket size.

8. Visualizing items in my basket

Time to visualize what we have in our basket. Use the "geom_col" function from the ggplot package to display a bar chart. We can reorder the items according to their frequency using the "reorder" function.

9. Why are we looking at my basket?

Why is there an interest to look at a basket? In fact, there could be a relationship between the different items in your basket. Let's go back to the examples mentioned earlier: In the supermarket setting, what are the odds of having both spaghetti and tomato sauce in the same basket? In an e-commerce setting, how likely are you to have a phone and a phone case in your shopping cart? In a DataCamp student's basket, would it be out of order to find both courses "Introduction to R" and "Intermediate R"? These are all questions and patterns we can answer and find out using Market Basket Analysis.

10. Happy shopping!

Now it is your turn to get familiar with items and bundles on the Online retail dataset!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.