Exercise

# Clustering on PCA results

In this final exercise, you will put together several steps you used earlier and, in doing so, you will experience some of the creativity that is typical in unsupervised learning.

Recall from earlier exercises that the PCA model required significantly fewer features to describe 80% and 95% of the variability of the data. In addition to *normalizing* data and potentially avoiding overfitting, PCA also uncorrelates the variables, sometimes improving the performance of other modeling techniques.

Let's see if PCA improves or degrades the performance of hierarchical clustering.

Instructions

**100 XP**

`wisc.pr`

, `diagnosis`

, `wisc.hclust.clusters`

, and `wisc.km`

are still available in your workspace.

- Using the minimum number of principal components required to describe at least 90% of the variability in the data, create a hierarchical clustering model with complete linkage. Assign the results to
`wisc.pr.hclust`

. - Cut this hierarchical clustering model into 4 clusters and assign the results to
`wisc.pr.hclust.clusters`

. - Using
`table()`

, compare the results from your new hierarchical clustering model with the actual diagnoses. How well does the newly created model with four clusters separate out the two diagnoses? - How well do the k-means and hierarchical clustering models you created in previous exercises do in terms of separating the diagnoses? Again, use the
`table()`

function to compare the output of each model with the vector containing the actual diagnoses.