Recap on transactions
1. Recap on transactions
Welcome back to the last chapter of this course. Whether Online or offline, customer preference information is collected by companies to know more about the customers and satisfy them in the most optimal way. In this Chapter, we will combine all that you've learned so far and go beyond retail, and apply Market Basket Analysis to perform movie recommendations.2. Important points in market basket analysis
Remember that the focus of Market Basket Analysis is to figure out what customers are consuming and not how much. In the retail case, it doesn't matter whether a customer bought 10 bananas or 1 banana, it matters that they actually bought at least one. Looking at other products in their basket provides insightful information. Similarly, it doesn't matter if customers watched the same movie once or 10 times, we want to understand what movies they've seen given the movies they've seen so far. Recall the three main metrics which are: support, confidence and lift. Finally, be careful whenever you want to inspect or display a large number of rules - rather start with a subset or use head/tail in order not to crash your R session.3. Groceries dataset
To recall the most important principles of Market Basket Analysis, let's go back to the Groceries dataset present in the "arules" package. This data set contains 1 month of real-world point-of-sale transaction data from a typical local grocery outlet. After loading the "arules" package we load the "Groceries" dataset. Note that the dataset is already in a transactional format. Let's have a first look at its summary.4. Summary of Groceries
From the "summary" of the transactional dataset, we learn that the dataset contains 9835 transactions and the items are aggregated into different categories. For instance, the item "frankfurter" belongs to the category "sausage", which itself belongs to the larger category of "meat and sausage". The most frequent items from this dataset are typical products customers buy on a weekly basis such as milk, vegetables or yogurt.5. Density of Groceries
Let's get a visual representation of the density of the item matrix. Plotting the full item matrix would result in a plot that is difficult to read given the large number of transactions. Let's take a random sample of 200 transactions and call the "image" function. The fraction of black cells compared to all cells of the item matrix should be close to the general density. In this case the density of the item matrix is 2.6%.6. Most and least popular items
Let's now visually see which items are the most and least popular in the groceries transactional dataset. For the most popular, we can use the "ItemFrequencyplot" and select the "topN" number of items to be displayed. However, for the least popular items, we have to use the traditional "barplot" function with different options to be able to display the least purchased items.7. Cross tables by index
One other useful function is the "CrossTable" function. It allows to retrieve the joined counts of various items and can hinge towards some relationships between items. The cross table is a symmetric matrix around its diagonal. In this example, 99 transactions contain both "frankfurter" and "sausages". Given the large number of items, we can as well sort the items in the matrix according to their frequency. To do so, just add "sort = TRUE" in the "CrossTable" function.8. Cross tables by item names
Instead of using indices for the cross table, we can as well use item names. For example, you can get the number of baskets containing both "milk" and "flour" directly by using the item name. In this case, there are 83 baskets containing both items. Going forward, you can perform a chi-square test. The outputted value is the p-value related to the chi-square test. In this case, the low p-value indicates a significant relationship between both items. Instead of displaying counts in the cross table, it is also possible to have a different metric within the matrix. For example, keeping the sorting option, we can display the "lift measure".9. MovieLens dataset
Now that you've had a refresher on how to deal with transactions on the "Groceries data", you're going to apply what you learned on the "movielens" dataset, where you will inspect and create transactions out of movie consumption data. The first step in building a movie recommendation system with Market Basket Analysis.10. Let's watch movies!
Aren't you curious of figuring out what kind of movies users are watching? Let's get started!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.