ComeçarComece de graça

Selecting number of clusters

In this exercise, you will compare the outputs from your hierarchical clustering model to the actual diagnoses. Normally when performing unsupervised learning like this, a target variable isn't available. We do have it with this dataset, however, so it can be used to check the performance of the clustering model.

When performing supervised learning—that is, when you're trying to predict some target variable of interest and that target variable is available in the original data—using clustering to create new features may or may not improve the performance of the final model. This exercise will help you determine if, in this case, hierarchical clustering provides a promising new feature.

Este exercício faz parte do curso

Unsupervised Learning in R

Ver curso

Instruções do exercício

wisc.data, diagnosis, wisc.pr, pve, and wisc.hclust are available in your workspace.

  • Use cutree() to cut the tree so that it has 4 clusters. Assign the output to the variable wisc.hclust.clusters.
  • Use the table() function to compare the cluster membership to the actual diagnoses.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Cut tree so that it has 4 clusters: wisc.hclust.clusters


# Compare cluster membership to actual diagnoses
Editar e executar o código