1. Multiple queries and filtering
Hi again! In this final lesson, we'll take a look at more advanced ways to use Chroma by performing multiple queries and filtering with metadatas. Let's dive in!
2. Movie recommendations based on multiple datapoints
In the previous chapter, we used embeddings to make recommendations based on multiple data points.
Let's do the same with the Netflix dataset and Chroma. We'll recommend movies related to other titles that a user has seen. Let's assume a user has seen: Terrifier, which is a horror film, and Strawberry Shortcake: Berry Bitty Adventures, a kid's TV show.
It's an odd combination, but hopefully it will help differentiate the recommendations.
3. Multiple query texts
Similarly to our previous implementation, we'll use the embedded texts of the reference items as queries.
First, we're using collection-dot-get to retrieve both of our reference texts. Notice that we're only extracting and storing the documents from these items in reference_texts.
Since collection-dot-query supports multiple query texts, we can pass our reference_texts directly; we'll ask for three results.
4. Multiple query texts result
With multiple query texts, we still get back a single dictionary. The difference is that we now get back multiple lists inside each entry's list. Looking at documents, for instance, we see a first list that matches our horror movie and a second list that seems to match the children's title.
Even though we asked for 3 results, we're getting 6: three for each query text. Notice how the titles we used in the query were most highly recommended, which makes sense, as they are the most similar to the query.
These could be removed in postprocessing by extracting the documents from the documents key and filtering out documents that are also in the reference_texts.
Another great way of filtering results is by utilizing metadata.
5. Adding metadata
So far, we've only worked with the IDs and documents from the netflix_titles CSV, but additional information, like the type (whether the title is a film or TV show) and release_year is also available. This data could be very useful if, for example, we only wanted to recommend films released recently.
We've edited the code to load the CSV by creating a list to store the metadatas, and populating it with the type and release_year from each row of the file, stored together in a dictionary.
Like before, we also create a list of IDs so we can add the metadatas to the existing items.
6. Adding and querying metadatas
We can update the items with their metadatas using the update method, this time, specifying the metadatas argument.
We can now use metadatas to filter our query. We are going to make the same search as before, but we'll include a where clause to indicate we only want to retrieve items where the "type" metadata is "Movie".
7. Where operators
The where filter we used here is actually a shortcut to mean "equals", which we can define explicitly using another dictionary with the $eq key.
There are a few operators we can use to support different comparisons, including greater, less than, and not equal to.
8. Multiple where filters
Finally, where filters can be combined with logical operators. In our case, we want titles of type movie AND released after a certain year.
To do this, we combine two where filters with an "$and" operator: type equals "movie" AND release_year greater than 2020.
To filter for results that meet at least one condition, we can use $or instead of $and.
9. Results
There we have it: our recommendations now only include movies released after 2020.
10. Let's practice!
And now, it's your turn!