Singular value decomposition (SVD)
1. Singular value decomposition (SVD)
There are many ways to find the factors of a matrix, but we will use a technique called singular value decomposition.2. What SVD does
Like any matrix factorization approach, singular value decomposition finds factors for your matrix.3. What SVD does
U is the user matrix4. What SVD does
V transpose is the features matrix (transpose in this case means that V has been flipped over its diagonal, but we do not need to worry about that here)5. What SVD does
but it also generates sigma as seen here, which is simply a diagonal matrix which can be thought of as the weights of the latent features, or how large an impact they are calculated to have.6. Prepping our data
We will once again be working with a DataFrame containing book ratings called book_ratings_df. Before we get started, we should take a look at its dimensions using dot shape. For SVD to optimally work, we will need to center the data by deducting the row average from each row as we did in the third chapter. For this we find the row means for each user7. Prepping our data
and subtract them from the matrix on a row-level. Once the DataFrame has been centered we can fill all the empty values with 0s without influencing the overall ratings.8. Applying SVD
With the dataset normalized, we can import SVDs from scipy and apply it to our DataFrame. You can specify k, the number of latent features being generated, but in this case, we will use the default of 6. The SVD generates U, sigma, and Vt. Let's take a look at the shape of each of them. Here, we see U with the same number of rows as the original matrix, and k columns, and Vt with the same number of columns as the original matrix, and k rows.9. Applying SVD
Finally, we should take a look at sigma. Note that although we expected a diagonal matrix for sigma, we get a list. This can be converted to a matrix using numpy's diag function.10. Getting the final matrix
Now that we have the full factor matrices, we can multiply them together to find the full utility matrix.11. Getting the final matrix
We find the dot product of U and sigma12. Getting the final matrix
and then find the dot product of the result and V transpose13. Getting the final matrix
to get the full matrix.14. Calculating the product in Python
This can be done in Python using numpy's dot product function. First, we take the dot product of U and sigma.15. Calculating the product in Python
And then the dot product of the result and V transpose. Note that these numbers look low because they have been centered.16. Add averages back
Therefore we need to add the values we deducted earlier back. We extract the values from the average values series and reshape it so it can be deducted row-wise from the matrix. Upon inspection, we can see that we have been able to fill in the missing values for the DataFrame with reasonable calculated values. Below we have the original DataFrame for comparison.17. Let's practice!
Now it's your turn to try it out with the movie dataset.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.