Hierarchical clustering: ward method

It is time for Comic-Con! Comic-Con is an annual comic-based convention held in major cities in the world. You have the data of last year's footfall, the number of people at the convention ground at a given time. You would like to decide the location of your stall to maximize sales. Using the ward method, apply hierarchical clustering to find the two points of attraction in the area.

The data is stored in a pandas DataFrame, comic_con. x_scaled and y_scaled are the column names of the standardized X and Y coordinates of people at a given point in time.

This exercise is part of the course

Cluster Analysis in Python

View Course

Exercise instructions

  • Import fcluster and linkage from scipy.cluster.hierarchy.
  • Use the ward method in the linkage() function.
  • Assign cluster labels by forming 2 flat clusters from distance_matrix.
  • Run the plotting code to see the results.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the fcluster and linkage functions
from scipy.cluster.hierarchy import ____, ____

# Use the linkage() function
distance_matrix = ____(comic_con[['x_scaled', 'y_scaled']], ____ = ____, metric = 'euclidean')

# Assign cluster labels
comic_con['cluster_labels'] = ____(____, ____, criterion='maxclust')

# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled', 
                hue='cluster_labels', data = comic_con)
plt.show()