Get startedGet started for free

Hierarchical clustering: Preparing for exploration

You have now created a potential clustering for the oes data, before you can explore these clusters with ggplot2 you will need to process the oes data matrix into a tidy data frame with each occupation assigned its cluster.

This exercise is part of the course

Cluster Analysis in R

View Course

Exercise instructions

  • Create the df_oes data frame from the oes data.matrix, making sure to store the rowname as a column (use rownames_to_column() from the tibble library).
  • Build the cluster assignment vector cut_oes using cutree() with a h = 100,000.
  • Append the cluster assignments as a column cluster to the df_oes data frame and save the results to a new data frame called clust_oes.
  • Use the pivot_longer() function from the tidyr() library to reshape the data into a format amenable for ggplot2 analysis and save the tidied data frame as gathered_oes.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

dist_oes <- dist(oes, method = 'euclidean')
hc_oes <- hclust(dist_oes, method = 'average')

library(tibble)
library(tidyr)

# Use rownames_to_column to move the rownames into a column of the data frame
df_oes <- rownames_to_column(as.data.frame(___), var = 'occupation')

# Create a cluster assignment vector at h = 100,000
cut_oes <- cutree(___, h = ___)

# Generate the segmented oes data frame
clust_oes <- mutate(___, cluster = ___)

# Create a tidy data frame by gathering the year and values into two columns
gathered_oes <- pivot_longer(data = ___, 
                       cols = -c(occupation, cluster),
                       names_to = "year",               
                       values_to = "mean_salary" )
Edit and Run Code