Get startedGet started for free

Optimality of the support-confidence border

You return to the founder with the scatterplot produced in the previous exercise and ask whether she would like you to use pruning to recover the support-confidence border. You tell her about the Bayardo-Agrawal result, but she seems skeptical and asks whether you can demonstrate this in an example.

Recalling that scatterplots can scale the size of dots according to a third metric, you decide to use that to demonstrate optimality of the support-confidence border. You will show this by scaling the dot size using the lift metric, which was one of the metrics to which Bayardo-Agrawal applies. The one-hot encoded data has been imported for you and is available as onehot. Additionally, apriori() and association_rules() have been imported and pandas is available as pd.

This exercise is part of the course

Market Basket Analysis in Python

View Course

Exercise instructions

  • Apply the Apriori algorithm to the DataFrame onehot.
  • Compute the association rules using the support metric and a minimum threshold of 0.0.
  • Complete the expression for the scatterplot such that the dot size is scaled by lift.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import seaborn under its standard alias
import seaborn as sns

# Apply the Apriori algorithm with a support value of 0.0075
frequent_itemsets = ____(____, min_support = 0.0075, 
                         use_colnames = True, max_len = 2)

# Generate association rules without performing additional pruning
rules = ____(frequent_itemsets, metric = "support", 
                          min_threshold = ____)

# Generate scatterplot using support and confidence
sns.scatterplot(x = "support", y = "confidence", 
                size = "____", data = rules)
plt.show()
Edit and Run Code