Recommendation systems

1. Recommendation systems

Welcome back! In this video, we'll discuss another common application of embeddings: recommendation systems!

2. Recommendation systems with embeddings

Recommendation systems work almost exactly the same as semantic search engines! We have a group of items that we potentially want to recommend, and at least one data point that we want to recommend based on. We embed the potential recommendations and the data point we have,

3. Recommendation systems with embeddings

calculate cosine distances,

4. Recommendation systems with embeddings

and recommend the item or items that are closest in the vector space. This similarity in approach means that a lot of the code is exactly the same - let's take a look!

5. Example: Recommended articles

We're returning to the news article dataset to design a recommendation system that recommends the three most similar articles to the one currently being read. This recommendation will be based on the article's headline, topic, and keywords, stored in a dictionary called current_article. This particular article is about the impact of computer hardware on AI advancement. To prepare both of these objects for embedding, we'll first need to combine the features into a single string for each article.

6. Combining features

We'll be reusing the create_article_text function you may remember from earlier, which extracts the headline, topic, and keywords, and uses an F-string to combine them into a single string. To combine the features, we call the function on each article in articles using a list comprehension, and do the same for the current article. By printing, we can see that the function correctly formatted the current article's information.

7. Creating Embeddings

Next, we embed both sets of article strings using the create_embeddings function from earlier, which allows us to create requests to the OpenAI embedding model in a more repeatable way.

8. Finding the most similar article

Finally, to compute the cosine distances and extract the nearest distances, we'll use our find_n_closest function, which computes the cosine distances, sorts them, and returns the n smallest and their index. We'll call this function on both sets of embedded articles, and then loop through the results, returning the headline of each article in the three most similar articles.

9. Finding the most similar article

There we have it! This is a good start, but a more sophisticated system would use not only the current article to base the recommendations on, but also the other articles in the user's history.

10. Adding user history

Let's consider that a user has visited two articles, stored in user_history. How can we provide recommendations based on multiple data points?

11. Recommendations on multiple data points

This is the situation we have, where the user has seen two articles, embedded in blue, and we want to recommend the most similar article. Articles they haven't seen yet are shown in red. To find the most similar vector to two vectors, we'll combine

12. Recommendations on multiple data points

the two vectors into one by taking the mean. Then, we'll compute cosine distances as we did before,

13. Recommendations on multiple data points

and recommend the closest vector.

14. Recommendations on multiple data points

If the nearest point has already been viewed, we'll make sure

15. Recommendations on multiple data points

to return the nearest unseen article.

16. Recommendations on multiple data points

The first two steps to embed the user_history are the same as before, combining the features for each article, and embedding the resulting strings. The only difference is that we take the mean to aggregate the two vectors into one that we can compare with the other articles. For the articles to recommend, we filter the list so it only contains articles not in the user_history. Then, as before, we combine the features and embed the text.

17. Recommendations on multiple data points

Finally, we compute the cosine distances using the same find_n_closest function, only this time, passing it the mean of the embedded user history. Then, we subset the filtered article list to find the most similar articles. Here they are! Notice that the article headlined, "Tech Giant Buys 49% Stake In AI Startup" wasn't recommended, as the user had already seen it.

18. Let's practice!

Now it's your turn!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.