LoslegenKostenlos loslegen

K-means clustering: first exercise

This exercise will familiarize you with the usage of k-means clustering on a dataset. Let us use the Comic Con dataset and check how k-means clustering works on it.

Recall the two steps of k-means clustering:

  • Define cluster centers through kmeans() function. It has two required arguments: observations and number of clusters.
  • Assign cluster labels through the vq() function. It has two required arguments: observations and cluster centers.

The data is stored in a pandas DataFrame, comic_con. x_scaled and y_scaled are the column names of the standardized X and Y coordinates of people at a given point in time.

Diese Übung ist Teil des Kurses

Cluster Analysis in Python

Kurs anzeigen

Anleitung zur Übung

  • Import kmeans and vq functions in SciPy.
  • Generate cluster centers using the kmeans() function with two clusters.
  • Create cluster labels using these cluster centers.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Import the kmeans and vq functions
from ____.cluster.vq import ____, ____

# Generate cluster centers
cluster_centers, distortion = ____

# Assign cluster labels
comic_con['cluster_labels'], distortion_list = ____

# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled', 
                hue='cluster_labels', data = comic_con)
plt.show()
Code bearbeiten und ausführen