Visualize the clusters
Until this point, we have everything required to plot the observations together with the ellipses representing the clusters.
Also, if we want to assign each observation to either one of the two cluster, we can use the function clusters()
and compare the results with the real labels. Just to remind you, when we used only the variable Weight
to cluster the data, we correctly predicted 4500
females and 4556
males. Let's see if we could separate better the clusters when an additional variable is incorporated.
This exercise is part of the course
Mixture Models in R
Exercise instructions
- Use the
geom_point()
to make the scatterplot forWeight
andBMI
. Add to this plot the two ellipses saved inellipses_comp_number
with the functiongeom_path()
. - Be aware that the ellipses should be transformed into a data frame.
- Colour the cluster 1 in red and the cluster 2 in blue.
- Estimate the frequency table for the real labels stored in the variable
Gender
versus the predicted ones estimated byclusters
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Plot the ellipses
gender %>%
ggplot(aes(x = ___, y = ___)) + ___()+
geom_path(data = data.frame(ellipse_comp_1), aes(x=x,y=y), col = "___") +
geom_path(data = data.frame(ellipse_comp_2), aes(x=x,y=y), col = "___")
# Check the assignments
table(gender$Gender, clusters(fit_with_cov))