Selecting number of clusters

In this exercise, you will compare the outputs from your hierarchical clustering model to the actual diagnoses. Normally when performing unsupervised learning like this, a target variable isn't available. We do have it with this dataset, however, so it can be used to check the performance of the clustering model.

When performing supervised learning—that is, when you're trying to predict some target variable of interest and that target variable is available in the original data—using clustering to create new features may or may not improve the performance of the final model. This exercise will help you determine if, in this case, hierarchical clustering provides a promising new feature.

Cet exercice fait partie du cours

Unsupervised Learning in R

Afficher le cours

Instructions

wisc.data, diagnosis, wisc.pr, pve, and wisc.hclust are available in your workspace.

Use cutree() to cut the tree so that it has 4 clusters. Assign the output to the variable wisc.hclust.clusters.
Use the table() function to compare the cluster membership to the actual diagnoses.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Cut tree so that it has 4 clusters: wisc.hclust.clusters


# Compare cluster membership to actual diagnoses

Modifier et exécuter le code