Get startedGet started for free

Bernoulli Mixture Models

1. Bernoulli Mixture Models

In the previous chapters, we focused on Gaussian Mixture Models, where the variables to be modeled are continuous and normally distributed. Sometimes though, as we briefly discussed, the Gaussian distribution is not the best choice. In this chapter, I will extend the notion of mixture models to consider the discrete case with, the Bernoulli distribution and then the Poisson distribution.

2. The handwritten digits dataset

The handwritten digits dataset is formed by 320 black and white images of handwritten threes and sixes and the goal is to find two clusters that can explain this data. To do so we need to understand which is a suitable distribution to model these data.

3. Continuous versus discrete variables

We have seen that the Gaussian distribution works well with continuous outcomes as the weight or the body mass index in the gender dataset. But if we want to explain an event where the outcome has two options, like flipping a coin, for example, the Gaussian distribution is no longer accurate and we need to introduce a new distribution called Bernoulli.

4. Bernoulli distribution

This distribution describes situations where there are only two possible outcomes. For example, flipping a coin where you can get either tails or heads, or the pixel colour in a black and white image. The distribution is represented by the probability of success, usually named p, that can describe either of the two events.

5. Sample of Bernoulli distribution

For example, I can simulate a Bernoulli of 100 observations where p, which takes the value of 0.7, represents the probability of getting 1, so 1 minus p is the probability of getting zero.

6. Binary image as Bernoulli distributions

Now you can imagine that in a binary image of 16 by 16 pixels, every single pixel can be represented by a Bernoulli distribution where the outcomes are either black or white.

7. Binary image as Bernoulli vector

Moreover, if we join the rows together horizontally, the same image can be described as a long vector of size 256. That is to say, a binary image of 16 by 16 pixels can be expressed as a row of 256 variables, where each variable is distributed as a Bernoulli.

8. Sample of multivariate Bernoulli distribution

For example, instead of having 256 variables, we only have an image of three pixels. We could easily generate binary images for one hundred observations, as shown in the code. Every pixel position is represented by its own probability p. The p_vector, formed by the probability of getting one for each column, generates these sort of binary images and because the images are represented by more than one Bernoulli, the distribution is a multivariate Bernoulli. For the sake of shortness, we will call it just Bernoulli.

9. Bernoulli mixture models

We can now summarize the modeling framework by answering the three mixture model questions for these data. First, we saw that the suitable distribution for these binary images is the Bernoulli distribution. Next, the number of clusters to be considered is going to be two. We could use more than two but the emphasis right now is to illustrate the analysis with this new distribution. And finally, the parameters to be estimated are the proportions of both clusters and the probability representing each pixel. In the next lesson, we will fit this model with flexmix.

10. Let's practice

But first, let's review the topics you've just learned.