Hierarchical clustering: Preparing for exploration
You have now created a potential clustering for the oes data, before you can explore these clusters with ggplot2 you will need to process the oes data matrix into a tidy data frame with each occupation assigned its cluster.
Cet exercice fait partie du cours
Cluster Analysis in R
Instructions
- Create the
df_oesdata frame from theoesdata.matrix, making sure to store the rowname as a column (userownames_to_column()from thetibblelibrary). - Build the cluster assignment vector
cut_oesusingcutree()with ah = 100,000. - Append the cluster assignments as a column
clusterto thedf_oesdata frame and save the results to a new data frame calledclust_oes. - Use the
pivot_longer()function from thetidyr()library to reshape the data into a format amenable for ggplot2 analysis and save the tidied data frame asgathered_oes.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
dist_oes <- dist(oes, method = 'euclidean')
hc_oes <- hclust(dist_oes, method = 'average')
library(tibble)
library(tidyr)
# Use rownames_to_column to move the rownames into a column of the data frame
df_oes <- rownames_to_column(as.data.frame(___), var = 'occupation')
# Create a cluster assignment vector at h = 100,000
cut_oes <- cutree(___, h = ___)
# Generate the segmented oes data frame
clust_oes <- mutate(___, cluster = ___)
# Create a tidy data frame by gathering the year and values into two columns
gathered_oes <- pivot_longer(data = ___,
cols = -c(occupation, cluster),
names_to = "year",
values_to = "mean_salary" )