Revisiting wholesale data: Exploration
From the previous analysis you have found that k = 2
has the highest average silhouette width. In this exercise you will continue to analyze the wholesale customer data by building and exploring a kmeans model with 2 clusters.
This exercise is part of the course
Cluster Analysis in R
Exercise instructions
- Build a k-means model called
model_customers
for thecustomers_spend
data using thekmeans()
function withcenters = 2
. - Extract the vector of cluster assignments from the model
model_customers$cluster
and store this in the variableclust_customers
. - Append the cluster assignments as a column
cluster
to thecustomers_spend
data frame and save the results to a new data frame calledsegment_customers
. - Calculate the size of each cluster using
count()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
set.seed(42)
# Build a k-means model for the customers_spend with a k of 2
model_customers <- ___
# Extract the vector of cluster assignments from the model
clust_customers <- ___
# Build the segment_customers data frame
segment_customers <- mutate(___, cluster = ___)
# Calculate the size of each cluster
count(___, ___)
# Calculate the mean for each category
segment_customers %>%
group_by(cluster) %>%
summarise_all(list(mean))