Hierarchical clustering: Preparing for exploration
You have now created a potential clustering for the oes
data, before you can explore these clusters with ggplot2 you will need to process the oes
data matrix into a tidy data frame with each occupation assigned its cluster.
This exercise is part of the course
Cluster Analysis in R
Exercise instructions
- Create the
df_oes
data frame from theoes
data.matrix, making sure to store the rowname as a column (userownames_to_column()
from thetibble
library). - Build the cluster assignment vector
cut_oes
usingcutree()
with ah = 100,000
. - Append the cluster assignments as a column
cluster
to thedf_oes
data frame and save the results to a new data frame calledclust_oes
. - Use the
pivot_longer()
function from thetidyr()
library to reshape the data into a format amenable for ggplot2 analysis and save the tidied data frame asgathered_oes
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
dist_oes <- dist(oes, method = 'euclidean')
hc_oes <- hclust(dist_oes, method = 'average')
library(tibble)
library(tidyr)
# Use rownames_to_column to move the rownames into a column of the data frame
df_oes <- rownames_to_column(as.data.frame(___), var = 'occupation')
# Create a cluster assignment vector at h = 100,000
cut_oes <- cutree(___, h = ___)
# Generate the segmented oes data frame
clust_oes <- mutate(___, cluster = ___)
# Create a tidy data frame by gathering the year and values into two columns
gathered_oes <- pivot_longer(data = ___,
cols = -c(occupation, cluster),
names_to = "year",
values_to = "mean_salary" )