Matrix factorization

1. Matrix factorization

Time to understand how matrix factorization can be performed and what value it brings.

2. Why this helps with sparse matrices

Like we mentioned in the last video, just as matrices can be multiplied together, they can be broken into their factors.

3. Why this helps with sparse matrices

A huge benefit of this, when performed in conjunction with recommendation systems, is that factors can be found as long as there is at least one value in every row and column. Or in other words every user has given at least one rating, and every item has been rated at least once. Why is this valuable? Because we can multiply these factors together to create a fully filled in matrix.

4. Why this helps with sparse matrices

That's right, it will calculate what values should be in these gaps based off of the incomplete matrix's factors. We will go into further depth about how we do this in the next lesson, but first, let's run through how we would factor our matrices.

5. What matrix factorization looks like

Matrix factorization breaks a matrix into two component matrices. Take a rating matrix

6. What matrix factorization looks like

with M users as rows

7. What matrix factorization looks like

and the N items they rated as the columns. Matrix factorization will break this down into one matrix with its depth equal to the number of users and one matrix with its width equal to the number of items.

8. What matrix factorization looks like

The number of values in the newly created dimensions shown here are called the rank of the matrix and must be equal to each other and can be decided by us. You may be wondering what these new unlabeled columns and rows represent, they are called latent features. These are the features that the matrix factorization view as mathematically the best ways to describe or sum up this dataset in the least number of features.

9. Latent features

To explain what that entails, let's take a closer look at a small example. Here we see four users and how they have rated six books and the decomposed version of the ratings matrix. You can see that the original matrix has six columns but the first matrix that is a factor only has two columns.

10. Latent features

Taking a look at latent feature 1, we can see that users who gave high ratings to horror and fantasy books got relatively high values for this feature,

11. Latent features

while for latent feature 2, a high value appears to correspond with users who preferred romance novels. This is a simplified example, and often latent features become harder to label with larger datasets, but these are features that the matrix factorization has calculated as representing patterns in the original matrix.

12. Information loss

One question that might come to mind when you see these large DataFrames

13. Information loss

being reduced to much smaller factor matrices is, how can it do this without losing information? In reality, you can't reduce down these matrices without at least some information loss - these factors are just close approximations of the original data.

14. Information loss

If we were to multiply the factors back together

15. Information loss

we would actually see a slight difference between the first and last matrix. Even the values we had originally may be off by a small fraction. This is nothing to worry about, but worth flagging so you are not surprised when matrices do not exactly match up.

16. Let's practice!

Now its time to identify some real world latent features.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.