k-means clustering and comparing results
As you now know, there are two main types of clustering: hierarchical and k-means.
In this exercise, you will create a k-means clustering model on the Wisconsin breast cancer data and compare the results to the actual diagnoses and the results of your hierarchical clustering model. Take some time to see how each clustering model performs in terms of separating the two diagnoses and how the clustering models compare to each other.
This exercise is part of the course
Unsupervised Learning in R
Exercise instructions
wisc.data
, diagnosis
, and wisc.hclust.clusters
are still available.
- Create a k-means model on
wisc.data
, assigning the result towisc.km
. Be sure to create 2 clusters, corresponding to the actual number of diagnosis. Also, remember to scale the data and repeat the algorithm 20 times to find a well performing model. - Use the
table()
function to compare the cluster membership of the k-means model to the actual diagnoses contained in thediagnosis
vector. How well does k-means separate the two diagnoses? - Use the
table()
function to compare the cluster membership of the k-means model to the hierarchical clustering model. Recall the cluster membership of the hierarchical clustering model is contained inwisc.hclust.clusters
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a k-means model on wisc.data: wisc.km
# Compare k-means to actual diagnoses
# Compare k-means to hierarchical clustering