Hierarchical clustering: ward method
It is time for Comic-Con! Comic-Con is an annual comic-based convention held in major cities in the world. You have the data of last year's footfall, the number of people at the convention ground at a given time. You would like to decide the location of your stall to maximize sales. Using the ward method, apply hierarchical clustering to find the two points of attraction in the area.
The data is stored in a pandas DataFrame, comic_con
. x_scaled
and y_scaled
are the column names of the standardized X and Y coordinates of people at a given point in time.
This exercise is part of the course
Cluster Analysis in Python
Exercise instructions
- Import
fcluster
andlinkage
fromscipy.cluster.hierarchy
. - Use the
ward
method in thelinkage()
function. - Assign cluster labels by forming 2 flat clusters from
distance_matrix
. - Run the plotting code to see the results.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the fcluster and linkage functions
from scipy.cluster.hierarchy import ____, ____
# Use the linkage() function
distance_matrix = ____(comic_con[['x_scaled', 'y_scaled']], ____ = ____, metric = 'euclidean')
# Assign cluster labels
comic_con['cluster_labels'] = ____(____, ____, criterion='maxclust')
# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled',
hue='cluster_labels', data = comic_con)
plt.show()