Build segmentation with k-means clustering
In this exercise, you will build the customer segmentation with KMeans
algorithm. As you've identified in the previous step, the mathematically optimal number of clusters is somewhere around 3 and 4. Here, you will build one with 4 segments.
The pre-processed dataset has been loaded as wholesale_scaled_df
. You will use it to run the KMeans
algorithm, and the raw un-processed dataset as wholesale
- you will later use it to explore the average column values for the 4 segments you'll build.
This exercise is part of the course
Machine Learning for Marketing in Python
Exercise instructions
- Import the
KMeans
algorithm fromsklearn.cluster
module. - Initialize
KMeans
algorithm with 4 clusters and a random state set to 123. - Fit the model on the pre-processed
wholesale_scaled_df
dataset. - Assign the generated labels to a new column called
segment
in the rawwholesale
dataset
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import `KMeans` module
from sklearn.cluster import ___
# Initialize `KMeans` with 4 clusters
kmeans=KMeans(___=4, random_state=123)
# Fit the model on the pre-processed dataset
kmeans.fit(___)
# Assign the generated labels to a new column
wholesale_kmeans4 = wholesale.assign(segment = kmeans.___)