Querying and updating the database
1. Querying and updating the database
We meet again! Let's continue our journey to learn about querying and updating our collection.2. Querying the database
Similar to what we did manually in the previous chapter, we'll build a semantic search application, but this time, using a vector database. The approach is exactly the same: we have a query string and we want to find similar titles in our collection. Previously, we had to embed the query string to get a query vector, which was used to find similar embeddings in the dataset.3. Querying the database
With Chroma, we'll let the collection do the embedding, so we can pass our query string directly and Chroma will take care of creating the embedding and performing the search.4. Retrieve the collection
First, we need to retrieve our collection, which we can do with client-dot-get_collection(), specifying the name of the collection to retrieve. Recall that when we created the collection, we specified the embedding function to use, and it's also really important to specify the same function when retrieving the collection. This way, Chroma will use the same embedding function to create the query vector.5. Querying the collection
To query the collection, we call collection-dot-query, passing our query string to "query_texts". Note that this parameter is plural, so even if we have a single query string, we pass a list. To specify how many items to retrieve, we can use the n_results parameter. Here's what's returned: it's not the simplest format, so let's break it down.6. Query results (dict)
First of all, it's a dictionary. It has entries for ids (the ids of the returned items), embeddings, documents, metadatas, and distances of the query results. The embeddings entry is empty, simply because Chroma doesn't return them by default. Aside from this, each of these entries has the same format; let's look at ids.7. Query results (lists of list)
ids contains a list of lists. The reason for this is the that the query method accepts a list of query texts - meaning we could use multiple query texts - we just happened to use one. Therefore, the results follow the same structure: the first list is the result of the first query. If we had multiple query texts, we would get back as many lists.8. Query results (lists)
As we can see, this is the same for each entry - they all contain a list of lists. Since we only used a single query text, we are only looking at the first list: the list for the first query text. In this list, we find a format similar to the parameters of the add() method: the first id corresponds to the first document, metadatas, and distances.9. Updating a collection
Next, let's see how to update a collection. Items in a collection can be updated with the update method. The syntax is similar to collection-dot-add(); in this example, we'll update the texts for items id-1 and id-2. Once again, Chroma will take care of creating the embeddings using the collection's embedding function.10. Upserting a collection
Alternatively, if we're not sure if the IDs are already present in the table, use the upsert method. upsert will add the IDs to the collection if they aren't present, and update them if they are - a combination of the update and add methods.11. Deleting
Similarly, we can delete from a collection using the collection-dot-delete() method and specifying the ids of the items to remove. Finally, if we want to completely empty the whole database (that means all collections and items), you can use client-dot-reset().12. Let's practice!
Time to put this into practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.