Scatterplots
1. Scatterplots
Heatmaps helped us to make sense of a large number of rules between a small number of antecedents and consequents. Scatterplots will help is to evaluate general tendencies in the behaviors of rules for many antecedents and consequents, but without isolating any rule in particular.2. Introduction to scatterplots
So what is a scatterplot? It is a type of visualization that displays pairs of values.3. Introduction to scatterplots
In market basket analysis, those values might be antecedent support and consequent support or confidence and lift. Scatterplots do not typically assume an underlying model. No trend line or fitted curve is needed. Scatterplots are useful in market basket analysis because they can provide guidance for further pruning rounds. Identifying the correct pruning thresholds may be difficult to do via trial-and-error, but looking at a scatterplot could make it clear where the relevant thresholds are located.4. Support versus confidence
Let's take a look at an example, which makes use of association rules generated from the MovieLens dataset. For each rule, the confidence value is plotted against the support value.5. Support versus confidence
This is not, in fact, a random choice of metrics to plot. Research by Bayardo and Agrawal in 1999 proved that the best-performing rules along a wide variety of common metrics -- including lift, conviction, confidence, support, and others not mentioned in this course -- must be located on the confidence-support border. In the plot, we can see what looks like a triangle. The points in the interior of the triangle are dominated by the points on its edges according to Bayardo-Agrawal. This suggests that we should make use of pruning to try to eliminate them.6. Generating a scatterplot
Let's create a scatterplot. We'll start by importing seaborn and pandas. We'll also need to apply Apriori and generate association rules, so we'll import the relevant libraries from mlxtend. Next, we'll load the one-hot encoded data and generate some rules. Since we want to do pruning after we view the scatterplot, we'll use low thresholds and apply them exclusively to support. Finally, we'll generate a simple scatterplot of antecedent support and consequent support using the seaborn scatterplot function. At a minimum, we must supply a value for the x variable, a value for the y variable, and the input data, which is in the form of a pandas DataFrame.7. Generating a scatterplot
What, if anything, can we learn from this scatterplot? First, no antecedent or consequent support values exceed 0-point-25. This means that any pruning we perform should focus on values within those bounds. And second, most values appear to be clustered below 0-point-15.8. Adding a third metric
In some cases, two metrics will not be sufficient to identify a relationship of interest. Rather than looking at antecedent support and consequent support, we might wonder how the picture changes when we include lift. That is, does lift have a tendency to be high or low for certain antecedent and consequent support values? We can examine this by changing the size of the dots in the scatterplot based on their lift values. The scatterplot function allows this through the use of the size parameter.9. Adding a third metric
We've now re-drawn the same plot, but allowed high lift values to be associated with bigger dots. Immediately, we can see that the biggest dots are clustered around very low antecedent and consequent support values. Such results could be generated by a small number of users, which suggests that the high lift values might not be as informative as we would normally expect. To the contrary, this plot should convince us to treat very high values of lift with skepticism.10. What can we learn from scatterplots?
So what do we learn from scatterplots? First, they allow us to identify natural thresholds in the data that would be difficult to discover via trial-and-error. And second, they allow us to visualize the entire dataset, which is infeasible using a heatmap. Both of these benefits will allow us to refine the pruning process, so that we can identify better rules.11. Let's practice!
We now know how to generate scatterplots using seaborn. Let's put those skills to work in some exercises!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.