Get startedGet started for free

Singular value decomposition (SVD)

1. Singular value decomposition (SVD)

There are many ways to find the factors of a matrix, but we will use a technique called singular value decomposition.

2. What SVD does

Like any matrix factorization approach, singular value decomposition finds factors for your matrix.

3. What SVD does

U is the user matrix

4. What SVD does

V transpose is the features matrix (transpose in this case means that V has been flipped over its diagonal, but we do not need to worry about that here)

5. What SVD does

but it also generates sigma as seen here, which is simply a diagonal matrix which can be thought of as the weights of the latent features, or how large an impact they are calculated to have.

6. Prepping our data

We will once again be working with a DataFrame containing book ratings called book_ratings_df. Before we get started, we should take a look at its dimensions using dot shape. For SVD to optimally work, we will need to center the data by deducting the row average from each row as we did in the third chapter. For this we find the row means for each user

7. Prepping our data

and subtract them from the matrix on a row-level. Once the DataFrame has been centered we can fill all the empty values with 0s without influencing the overall ratings.

8. Applying SVD

With the dataset normalized, we can import SVDs from scipy and apply it to our DataFrame. You can specify k, the number of latent features being generated, but in this case, we will use the default of 6. The SVD generates U, sigma, and Vt. Let's take a look at the shape of each of them. Here, we see U with the same number of rows as the original matrix, and k columns, and Vt with the same number of columns as the original matrix, and k rows.

9. Applying SVD

Finally, we should take a look at sigma. Note that although we expected a diagonal matrix for sigma, we get a list. This can be converted to a matrix using numpy's diag function.

10. Getting the final matrix

Now that we have the full factor matrices, we can multiply them together to find the full utility matrix.

11. Getting the final matrix

We find the dot product of U and sigma

12. Getting the final matrix

and then find the dot product of the result and V transpose

13. Getting the final matrix

to get the full matrix.

14. Calculating the product in Python

This can be done in Python using numpy's dot product function. First, we take the dot product of U and sigma.

15. Calculating the product in Python

And then the dot product of the result and V transpose. Note that these numbers look low because they have been centered.

16. Add averages back

Therefore we need to add the values we deducted earlier back. We extract the values from the average values series and reshape it so it can be deducted row-wise from the matrix. Upon inspection, we can see that we have been able to fill in the missing values for the DataFrame with reasonable calculated values. Below we have the original DataFrame for comparison.

17. Let's practice!

Now it's your turn to try it out with the movie dataset.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.