Optimality of the support-confidence border
You return to the founder with the scatterplot produced in the previous exercise and ask whether she would like you to use pruning to recover the support-confidence border. You tell her about the Bayardo-Agrawal result, but she seems skeptical and asks whether you can demonstrate this in an example.
Recalling that scatterplots can scale the size of dots according to a third metric, you decide to use that to demonstrate optimality of the support-confidence border. You will show this by scaling the dot size using the lift metric, which was one of the metrics to which Bayardo-Agrawal applies. The one-hot encoded data has been imported for you and is available as onehot
. Additionally, apriori()
and association_rules()
have been imported and pandas
is available as pd
.
This exercise is part of the course
Market Basket Analysis in Python
Exercise instructions
- Apply the Apriori algorithm to the DataFrame
onehot
. - Compute the association rules using the
support
metric and a minimum threshold of 0.0. - Complete the expression for the scatterplot such that the dot size is scaled by
lift
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import seaborn under its standard alias
import seaborn as sns
# Apply the Apriori algorithm with a support value of 0.0075
frequent_itemsets = ____(____, min_support = 0.0075,
use_colnames = True, max_len = 2)
# Generate association rules without performing additional pruning
rules = ____(frequent_itemsets, metric = "support",
min_threshold = ____)
# Generate scatterplot using support and confidence
sns.scatterplot(x = "support", y = "confidence",
size = "____", data = rules)
plt.show()