Get startedGet started for free

Scatterplots

1. Scatterplots

Heatmaps helped us to make sense of a large number of rules between a small number of antecedents and consequents. Scatterplots will help is to evaluate general tendencies in the behaviors of rules for many antecedents and consequents, but without isolating any rule in particular.

2. Introduction to scatterplots

So what is a scatterplot? It is a type of visualization that displays pairs of values.

3. Introduction to scatterplots

In market basket analysis, those values might be antecedent support and consequent support or confidence and lift. Scatterplots do not typically assume an underlying model. No trend line or fitted curve is needed. Scatterplots are useful in market basket analysis because they can provide guidance for further pruning rounds. Identifying the correct pruning thresholds may be difficult to do via trial-and-error, but looking at a scatterplot could make it clear where the relevant thresholds are located.

4. Support versus confidence

Let's take a look at an example, which makes use of association rules generated from the MovieLens dataset. For each rule, the confidence value is plotted against the support value.

5. Support versus confidence

This is not, in fact, a random choice of metrics to plot. Research by Bayardo and Agrawal in 1999 proved that the best-performing rules along a wide variety of common metrics -- including lift, conviction, confidence, support, and others not mentioned in this course -- must be located on the confidence-support border. In the plot, we can see what looks like a triangle. The points in the interior of the triangle are dominated by the points on its edges according to Bayardo-Agrawal. This suggests that we should make use of pruning to try to eliminate them.

6. Generating a scatterplot

Let's create a scatterplot. We'll start by importing seaborn and pandas. We'll also need to apply Apriori and generate association rules, so we'll import the relevant libraries from mlxtend. Next, we'll load the one-hot encoded data and generate some rules. Since we want to do pruning after we view the scatterplot, we'll use low thresholds and apply them exclusively to support. Finally, we'll generate a simple scatterplot of antecedent support and consequent support using the seaborn scatterplot function. At a minimum, we must supply a value for the x variable, a value for the y variable, and the input data, which is in the form of a pandas DataFrame.

7. Generating a scatterplot

What, if anything, can we learn from this scatterplot? First, no antecedent or consequent support values exceed 0-point-25. This means that any pruning we perform should focus on values within those bounds. And second, most values appear to be clustered below 0-point-15.

8. Adding a third metric

In some cases, two metrics will not be sufficient to identify a relationship of interest. Rather than looking at antecedent support and consequent support, we might wonder how the picture changes when we include lift. That is, does lift have a tendency to be high or low for certain antecedent and consequent support values? We can examine this by changing the size of the dots in the scatterplot based on their lift values. The scatterplot function allows this through the use of the size parameter.

9. Adding a third metric

We've now re-drawn the same plot, but allowed high lift values to be associated with bigger dots. Immediately, we can see that the biggest dots are clustered around very low antecedent and consequent support values. Such results could be generated by a small number of users, which suggests that the high lift values might not be as informative as we would normally expect. To the contrary, this plot should convince us to treat very high values of lift with skepticism.

10. What can we learn from scatterplots?

So what do we learn from scatterplots? First, they allow us to identify natural thresholds in the data that would be difficult to discover via trial-and-error. And second, they allow us to visualize the entire dataset, which is infeasible using a heatmap. Both of these benefits will allow us to refine the pruning process, so that we can identify better rules.

11. Let's practice!

We now know how to generate scatterplots using seaborn. Let's put those skills to work in some exercises!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.