ComenzarEmpieza gratis

Selecting number of clusters

In this exercise, you will compare the outputs from your hierarchical clustering model to the actual diagnoses. Normally when performing unsupervised learning like this, a target variable isn't available. We do have it with this dataset, however, so it can be used to check the performance of the clustering model.

When performing supervised learning—that is, when you're trying to predict some target variable of interest and that target variable is available in the original data—using clustering to create new features may or may not improve the performance of the final model. This exercise will help you determine if, in this case, hierarchical clustering provides a promising new feature.

Este ejercicio forma parte del curso

Unsupervised Learning in R

Ver curso

Instrucciones del ejercicio

wisc.data, diagnosis, wisc.pr, pve, and wisc.hclust are available in your workspace.

  • Use cutree() to cut the tree so that it has 4 clusters. Assign the output to the variable wisc.hclust.clusters.
  • Use the table() function to compare the cluster membership to the actual diagnoses.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Cut tree so that it has 4 clusters: wisc.hclust.clusters


# Compare cluster membership to actual diagnoses
Editar y ejecutar código