Visualize the clusters

Until this point, we have everything required to plot the observations together with the ellipses representing the clusters.

Also, if we want to assign each observation to either one of the two cluster, we can use the function clusters() and compare the results with the real labels. Just to remind you, when we used only the variable Weight to cluster the data, we correctly predicted 4500 females and 4556 males. Let's see if we could separate better the clusters when an additional variable is incorporated.

Deze oefening maakt deel uit van de cursus

Mixture Models in R

Cursus bekijken

Oefeninstructies

Use the geom_point() to make the scatterplot for Weight and BMI. Add to this plot the two ellipses saved in ellipses_comp_number with the function geom_path().
Be aware that the ellipses should be transformed into a data frame.
Colour the cluster 1 in red and the cluster 2 in blue.
Estimate the frequency table for the real labels stored in the variable Gender versus the predicted ones estimated by clusters.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Plot the ellipses
gender %>% 
  ggplot(aes(x = ___, y = ___)) + ___()+
  geom_path(data = data.frame(ellipse_comp_1), aes(x=x,y=y), col = "___") +
  geom_path(data = data.frame(ellipse_comp_2), aes(x=x,y=y), col = "___")
# Check the assignments
table(gender$Gender, clusters(fit_with_cov))

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Mixture Models in R

SkillTag.level.intermediateSkillTag.label

4.8+

Begin de cursus gratis

In this chapter, you will be introduced to fundamental concepts in model-based clustering and how this approach differs from other clustering techniques. You will learn the generating process of Gaussian Mixture Models as well as how to visualize the clusters.

Exercise 1: Introduction to model-based clustering Exercise 2: Clustering approaches Exercise 3: Explore gender data Exercise 4: Gaussian distribution Exercise 5: Sampling a Gaussian distribution Exercise 6: (not so good) Estimations of the mean and sd Exercise 7: Gaussian mixture models (GMM)Exercise 8: Simulate a mixture of two Gaussian distributions Exercise 9: Plot histogram of Gaussian Mixture Exercise 10: Mixture of three Gaussian distributions

In this chapter, you will be introduced to the main structure of Mixture Models, how to address different data with this approach and how to estimate the parameters involved. To accomplish the estimation, you will learn an iterative method called Expectation-Maximization algorithm.

Exercise 1: Structure of mixture models Exercise 2: Which probability distribution?Exercise 3: Handwritten digits dataset Exercise 4: Parameters estimation Exercise 5: Estimation given the probabilities Exercise 6: Calculating the probabilities Exercise 7: EM algorithm Exercise 8: Expectation function Exercise 9: Maximization function Exercise 10: Apply the two steps Exercise 11: Plot the estimated clusters

This chapter shows how to fit Gaussian Mixture Models in 1 and 2 dimensions with `flexmix` package. The data used is formed by 10.000 observations of people with their weight, height, body mass index and informed gender.

Exercise 1: Univariate Gaussian Mixture Models Exercise 2: Number of clusters Exercise 3: Number of parameters Exercise 4: Univariate Gaussian Mixture Models with flexmix Exercise 5: Univariate case with flexmix Exercise 6: Extracting Parameters for Univariate Case Exercise 7: Visualizing Univariate Gaussian Mixture Model Exercise 8: Compare the results Exercise 9: Bivariate Gaussian Mixture Models Exercise 10: Cross-term from covariance matrix Exercise 11: Parameters in the bivariate case Exercise 12: Bivariate Gaussian Mixture Models with flexmix Exercise 13: Fit the model with cross-terms Exercise 14: Get the components Exercise 15: Create the ellipses Exercise 16: Visualize the clusters

Huidige oefening

In this module, you will learn how Mixture Models extends to consider probability distributions different from the Gaussian and how these models are fitted with `flexmix`. The datasets used are handwritten digits images and the number of crimes in Chicago city. For the first dataset you will find clusters that summarize the handwritten digits and for the second dataset, you will find clusters of communities where is more or less dangerous to live in.

Exercise 1: Bernoulli Mixture Models Exercise 2: Binary images Exercise 3: How many values?Exercise 4: Bernoulli Mixture Models with flexmix Exercise 5: Handwritten digits with `flexmix`Exercise 6: Poisson Mixture Models Exercise 7: Discover the lambda Exercise 8: Sample from Poisson distribution Exercise 9: Poisson Mixture Models with flexmix Exercise 10: Crimes data with `flexmix`