Get startedGet started for free

User profile recommendations

1. User profile recommendations

In this chapter, you have learned how to use items' attributes to generate content-based recommendations by finding items that are similar to each other.

2. Item to item recommendations

This has many uses such as suggesting obscure books that are similar to your favorite, proposing the next movie to watch that is like the one you just finished, or even finding alternative options when items are out of stock.

3. User profiles

But people are not so one dimensional that they only like one item. They may have read many different books and want to find one that is aligned with their wide array of tastes. For example, taking a look at tfidf_summary_df that we have used previously, we have a row per book, with a column for each of the possible genres it could fall under. For user-based recommendations, we need vectors to represent individual items as well as vectors to represent a user's likes. This will allow us to compare a user's likes to various items to see which items might suit them best.

4. Extract the user data

Let's take an example of a user that has read a set of books. The most straightforward way of creating a user profile is to first get the vectors corresponding with the books they have read, by slicing tfidf_summary_df containing all the books, as you see here using the reindex method. Remember tfidf_summary_df contains TF-IDF features for all books in our dataset. This creates a DataFrame containing rows only for books the user has read and their TF-IDF scores. This still has multiple rows and we want a single vector for our user. To go from the full table to a summary of the users tastes we can simply find the average of each column, representing the average of the characteristics of the books the user liked.

5. Build the user profile

We find the average of each column by calling dot mean on the DataFrame. The average values in this Series represent the user profile or in other words a way of representing all of the user's preferences at once. For example, this user appears to enjoy books that have high values in the "ancient" TF-IDF feature. This implies that the word "ancient" is prominent in books they like. This profile can, with a bit of reshaping, be used as a vector to compare against other books.

6. Finding recommendations for a user

This user profile can then be used to find the most similar books that they have not yet read. We first must find the subset of books that have not been read by dropping those contained in the watched list (specifying the index axis by setting axis to 0). We then calculate the cosine similarity matrix as we did in the previous lesson, but this time between the User profile vector you just created and the DataFrame of all the books the user has not read yet. Then we wrap the output in a DataFrame and sort the results once again so we can access and order the data easily.

7. Getting the top recommendations

After sorting the recommendation scores you will now be able to recommend items based on a user's full history, not just based on individual items. These top values are the items that are the most similar to the interests of the user based on their full background of interests, making them good suggestions for the user to read next.

8. Let's practice!

Great, let's work with the movie dataset to build up user-profiles and create recommendations based on them.