K-means clustering: first exercise
This exercise will familiarize you with the usage of k-means clustering on a dataset. Let us use the Comic Con dataset and check how k-means clustering works on it.
Recall the two steps of k-means clustering:
- Define cluster centers through
kmeans()
function. It has two required arguments: observations and number of clusters. - Assign cluster labels through the
vq()
function. It has two required arguments: observations and cluster centers.
The data is stored in a pandas DataFrame, comic_con
. x_scaled
and y_scaled
are the column names of the standardized X and Y coordinates of people at a given point in time.
This exercise is part of the course
Cluster Analysis in Python
Exercise instructions
- Import
kmeans
andvq
functions in SciPy. - Generate cluster centers using the
kmeans()
function with two clusters. - Create cluster labels using these cluster centers.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the kmeans and vq functions
from ____.cluster.vq import ____, ____
# Generate cluster centers
cluster_centers, distortion = ____
# Assign cluster labels
comic_con['cluster_labels'], distortion_list = ____
# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled',
hue='cluster_labels', data = comic_con)
plt.show()