Get startedGet started for free

Heatmaps

1. Heatmaps

A recurring theme throughout the course has been that market basket analysis often leaves us with many rules. When you find yourself unable to select among these rules, creating an informative visualization is often the best way to move forward. In this chapter, we'll look at three different visualization techniques for association rules, starting with heatmaps. We'll also make use of a new dataset from MovieLens.

2. MovieLens dataset

The MovieLens dataset is similar to the GoodBooks dataset in that it contains user ratings, rather than transactions. We'll use movies that have a rating of four or higher, which we'll load as the DataFrame ratings.

3. Creating "transactions" from ratings

We can transform such a DataFrame into a transaction-like object by creating a list of lists -- called "library" here -- where each element of this list is a list of movies watched and highly rated by a unique user. For example, here are the movies one such user rated highly.

4. One-hot encoding transactions

We can then combine those lists into a list of lists and one-hot encode it.

5. One-hot encoding transactions

We now have data in the standard format we'll use throughout the chapter. Each column corresponds to a movie. Each row corresponds to a unique user's library. A true value indicates that the user watched the movie and rated it highly.

6. What is a heatmap?

So what is a heatmap and how can it help us to understand the association rules we generate for movies? A heatmap visualizes the intensity of the relationships between pairs of objects. This is useful for our purposes because market basket analysis involves the generation of rules between antecedent-consequent pairs. And those rules can be evaluated using metrics, which are a measure of the intensity of some dimension of the relationship. Heatmaps take the form of a matrix of colors. In the matrix shown, black is associated with a low intensity; whereas white is associated with a high intensity. If we pick a cell in the matrix, such as the 0th row and 17th column, we can see that the color is black, indicating a low intensity. And we can do this for any pair of objects.

7. Preparing the data

If we want to generate a heatmap from the one-hot encoded libraries data, we'll need to do three additional things. First, generate the rules. Second, convert the antecedents and consequents from frozen sets into strings. And third, convert the rules into a matrix format that is suitable for use in a heatmap.

8. Preparing the data

Let's start by performing the necessary imports. Notice that we've included a new one: seaborn as sns. We'll use the seaborn module to create heatmaps. The rule generation step involves applying the Apriori algorithm and then association rules.

9. Generating a heatmap

Next, we'll convert the antecedents and consequents into strings using lambda functions. Note that the antecedents and consequents are stored as frozen sets in mlxtend. A frozen set is simply a collection of unique items that cannot be modified after it is created. The lambda function takes each antecedent or consequent, converts it into a list, and then joins the list elements together into a string, separating each by comma if there is more than one. In our case, we'll be considering itemsets with a single item, so joining with a comma is not strictly necessary. Furthermore, note that you could still produce a heatmap without taking this second step, but the labels would be a frozen set, rather than string.

10. Generating a heatmap

We next use the pivot method of DataFrames to reshape our data into a matrix. We'll set the index parameter to consequents, the columns parameter to antecedents, and the values to support. This will generate a matrix of support values with antecedents on the horizontal axis and consequents on the vertical. Note that we could have used any metric other than support, such as confidence or lift. Finally, we pass the matrix we generated to seaborn's heatmap function.

11. Generating a heatmap

We've now generated a heatmap. Notice that the columns are antecedents and the rows are consequents. Lighter colors indicate higher support. Gray cells indicate that no association rule was identified for that itemset. We can immediately see high support for pairs of movies in the Lord of the Rings trilogy, which is not surprising. We can also see a high support value for The Matrix and The Dark Knight.

12. Customizing heatmaps

Generating a heatmap was simple, but what if we want to customize it? We can set the annotations parameter to true, which will add the numerical values to the matrix. We can turn the color bar to false, which will remove it from the plot. And can adjust the colors used with cmap parameter.

13. Let's practice!

You now know how to create heatmaps. Let's practice creating and analyzing them in some exercises.