1. Reviewing the HC results
Great job, you've successfully analyzed the occupational wage data using hierarchical clustering.
Now, let's briefly discuss these results before moving on to kmeans clustering.
2. The dendrogram
Remember that this dendrogram was constructed using a euclidean distance and an average linkage criteria.
What this means is that at the height of any given branch, all observations belonging to that branch must have an average euclidean distance amongst each other less than or equal to the height of that branch.
Rather than using a pre-determined value of k when cutting the tree you used the structure of the tree to make the decision.
A height of 100,000 seems reasonable when looking at this structure and generates three clusters.
However, it would just be as reasonable to go higher to create two clusters or lower to create four.
To better understand the consequence of the cut height, you explored the resulting clusters to see if they make sense.
3. The trends
More specifically you plotted the trends of these three clusters and used color to compare and contrast them. Visually this seems to be a reasonable clustering with three distinct trends or slopes that emerge from the three clusters.
4. Connecting the two
Based on this analysis one observation we can make is that two occupations concurrently had a higher growth in average wages relative to the others.
These are the Management and Legal occupations.
Good to know when planning a career trajectory huh?
5. Next steps: k-means clustering
Let's revisit this data through the lens of k-means clustering.
In k-means analysis you would first need to determine if any pre-processing steps are necessary. However we have already explored this in the hierarchical clustering work and know that the data can be used as is.
So the first step will be to empirically estimate the value of k using the two methods you have learned about, the elbow plot and the maximum average silhouette width.
Finally, as with any good clustering analysis, you will analyze your resulting clusters to see they make sense and find out what you can learn from them.
6. Let's cluster!
Let's cluster.