Get startedGet started for free

Revisiting wholesale data: Exploration

From the previous analysis you have found that k = 2 has the highest average silhouette width. In this exercise you will continue to analyze the wholesale customer data by building and exploring a kmeans model with 2 clusters.

This exercise is part of the course

Cluster Analysis in R

View Course

Exercise instructions

  • Build a k-means model called model_customers for the customers_spend data using the kmeans() function with centers = 2.
  • Extract the vector of cluster assignments from the model model_customers$cluster and store this in the variable clust_customers.
  • Append the cluster assignments as a column cluster to the customers_spend data frame and save the results to a new data frame called segment_customers.
  • Calculate the size of each cluster using count().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

set.seed(42)

# Build a k-means model for the customers_spend with a k of 2
model_customers <- ___

# Extract the vector of cluster assignments from the model
clust_customers <- ___

# Build the segment_customers data frame
segment_customers <- mutate(___, cluster = ___)

# Calculate the size of each cluster
count(___, ___)

# Calculate the mean for each category
segment_customers %>% 
  group_by(cluster) %>% 
  summarise_all(list(mean))
Edit and Run Code