Aan de slagGa gratis aan de slag

Visualize the clusters

Until this point, we have everything required to plot the observations together with the ellipses representing the clusters.

Also, if we want to assign each observation to either one of the two cluster, we can use the function clusters() and compare the results with the real labels. Just to remind you, when we used only the variable Weight to cluster the data, we correctly predicted 4500 females and 4556 males. Let's see if we could separate better the clusters when an additional variable is incorporated.

Deze oefening maakt deel uit van de cursus

Mixture Models in R

Cursus bekijken

Oefeninstructies

  • Use the geom_point() to make the scatterplot for Weight and BMI. Add to this plot the two ellipses saved in ellipses_comp_number with the function geom_path().
  • Be aware that the ellipses should be transformed into a data frame.
  • Colour the cluster 1 in red and the cluster 2 in blue.
  • Estimate the frequency table for the real labels stored in the variable Gender versus the predicted ones estimated by clusters.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Plot the ellipses
gender %>% 
  ggplot(aes(x = ___, y = ___)) + ___()+
  geom_path(data = data.frame(ellipse_comp_1), aes(x=x,y=y), col = "___") +
  geom_path(data = data.frame(ellipse_comp_2), aes(x=x,y=y), col = "___")
# Check the assignments
table(gender$Gender, clusters(fit_with_cov))
Code bewerken en uitvoeren