Get startedGet started for free

K-means clustering: first exercise

This exercise will familiarize you with the usage of k-means clustering on a dataset. Let us use the Comic Con dataset and check how k-means clustering works on it.

Recall the two steps of k-means clustering:

  • Define cluster centers through kmeans() function. It has two required arguments: observations and number of clusters.
  • Assign cluster labels through the vq() function. It has two required arguments: observations and cluster centers.

The data is stored in a pandas DataFrame, comic_con. x_scaled and y_scaled are the column names of the standardized X and Y coordinates of people at a given point in time.

This exercise is part of the course

Cluster Analysis in Python

View Course

Exercise instructions

  • Import kmeans and vq functions in SciPy.
  • Generate cluster centers using the kmeans() function with two clusters.
  • Create cluster labels using these cluster centers.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the kmeans and vq functions
from ____.cluster.vq import ____, ____

# Generate cluster centers
cluster_centers, distortion = ____

# Assign cluster labels
comic_con['cluster_labels'], distortion_list = ____

# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled', 
                hue='cluster_labels', data = comic_con)
plt.show()
Edit and Run Code