EM algorithm

1. EM algorithm

In the last lesson, you saw that the parameter estimation problem can be divided into two steps. Here I will generalize this method with the Expectation-Maximization algorithm.

2. Same problem, this time for real

So this time, let's complete the estimation of the means and proportions of the two Gaussian distributions that cluster this data. Remember that in order to keep things simple, we are assuming both standard deviations known equal to one, so we don't need to estimate them.

3. Iteration 0: Initial parameters

We can start the iteration process guessing with arbitrary values. For example, let's naively assume that one Gaussian has a mean of 1 and the other of 2. Moreover, since we don't have a clue about the proportion of each one, let's give them 50% to both.

4. Iteration 0: Initial parameters

At first sight, it doesn't look that these initial parameters explain the data quite well, but this is just our initial belief.

5. Iteration 1: Estimate probabilities (Expectation)

Since we have the means and the proportions, we can estimate the probabilities as you did previously. We are going to call this step the expectation because it gives us the expected probabilities given the parameters.

6. Iteration 1: Estimate parameters (Maximization)

Since we have the probabilities, we can estimate the new means and the new proportions. To do so, we follow the same process as before. For the means, we calculate a weighted average by the corresponding probabilities. This gives us the new means of 2.85 and 4.57, respectively. For the proportions, we take the sample mean of the probabilities. This gives us 10 percent for the first cluster and 90 for the second. This step is called maximization because implicitly what we do is to find the parameters that maximize a likelihood function, which is beyond the scope of this course.

7. Iteration 1: Estimate parameters (Maximization)

Thus, the new clusters after iteration 1 look like the figure shown here. Then, to estimate the parameters for the next iteration we need to use these new parameters to calculate the probabilities and then use these to estimate the new parameters and so on.

8. Expectation-Maximization algorithm

Therefore, we begin with our initial guess of the means and the proportions, then we calculate the probabilities in the step called expectation and use these probabilities to estimate the new means and proportions in the step called maximization. The process can go on for a while, but usually we cut it when the new parameters don't change too much with respect to the ones from the previous iteration. Both steps, expectation and maximization, can be wrapped up into a function.

9. Expectation function

The expectation step receives the data, the means and the proportions, and calculates the probabilities. The output of this step is then the data with the two columns representing the probabilities of each cluster.

10. Maximization function

The maximization step receives the data and its probabilities and returns the estimates of the means and the proportions. The first value of the returning list corresponds to the means and the second, to the proportions.

11. Iteratively

Now we can apply these functions iteratively to the data as shown here with 10 iterations. The expectation gets the data and the initial values and gives back the data with the probabilities. The maximization takes these data in order to find the new estimations, which will be our new initial values for the next iteration. The `cat` function is used to output the objects. Observe how the parameters seem to converge for the last iterations.

12. After 10 iterations

After iteration 10, the resulting parameters are a mean of 2.65 and a proportion of 26% for the red cluster, and 5.02 with 74% for the blue cluster.

13. Let's practice!

Now is time for you to put this into practice.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.