Hierarchical clustering: ward method
It is time for Comic-Con! Comic-Con is an annual comic-based convention held in major cities in the world. You have the data of last year's footfall, the number of people at the convention ground at a given time. You would like to decide the location of your stall to maximize sales. Using the ward method, apply hierarchical clustering to find the two points of attraction in the area.
The data is stored in a pandas DataFrame, comic_con. x_scaled and y_scaled are the column names of the standardized X and Y coordinates of people at a given point in time.
Cet exercice fait partie du cours
Cluster Analysis in Python
Instructions
- Import
fclusterandlinkagefromscipy.cluster.hierarchy. - Use the
wardmethod in thelinkage()function. - Assign cluster labels by forming 2 flat clusters from
distance_matrix. - Run the plotting code to see the results.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Import the fcluster and linkage functions
from scipy.cluster.hierarchy import ____, ____
# Use the linkage() function
distance_matrix = ____(comic_con[['x_scaled', 'y_scaled']], ____ = ____, metric = 'euclidean')
# Assign cluster labels
comic_con['cluster_labels'] = ____(____, ____, criterion='maxclust')
# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled',
hue='cluster_labels', data = comic_con)
plt.show()