Get startedGet started for free

Batching upserts

1. Batching upserts

Congrats on making it this far! We'll now explore performance tuning strategies for upserting to Pinecone indexes.

2. Upserting limitations

To help manage requests to their API, which maintains fair use for all users, Pinecone places limits on the rate and size of upsertion requests. These limits differ depending on the account type, but for some use cases, strategies have to be implemented to work around these limitations. In this video, we'll discuss batching, which involves breaking large numbers of vectors into more manageable batches or chunks.

3. Defining a chunking function

To start, we'll create a chunks() function that will break our long list of vectors into smaller chunks containing batch_size vectors, 100, by default. The first step is to convert the list, which is an iterable, into an iterator with the iter() function. Recall that iterables are data types like strings, lists, and dictionaries where elements can be extracted one at a time. Iterators only produce elements on-demand, and return data in streams, which we need for our case. Next, we create the first chunk by calling itertools.islice() on the iterator and specifying the batch_size parameter. We also wrap it in a tuple. We use a while loop that continues until there are no chunks left, yield the current chunk, which can then be upserted, and then update the chunk with the next batch of vectors using the same code that we used to define the first chunk.

4. Sequential batching

We'll first look at doing a batched upsertion sequentially, which involves splitting requests into chunks and sending them one-by-one to the index. We initialize our client and connect to the index. Then, we use our chunks() function to create an iterator and iterate over it, upserting each chunk of vectors until every chunk has been upserted. This solves the problem of rate and size limiting, but does come at the cost of speed - this can be really slow! To speed up batch upsertion, parallelizing requests is usually the best option.

5. Parallel batching

In parallel batching, many batches can sent at the same time, or in parallel. To enable parallel requests, initialize the Pinecone client with the pool_threads parameter, which sets the maximum number of simultaneous requests. Next, we start a with statement to connect to the index, again, specifying the pool_threads parameter. Next, we begin the upsertion, which we do asynchronously so that requests can be sent independently. This is indicated with the async_req parameter, which is set to True in this case. The rest of the code is a list comprehension that upserts each chunk created by our chunks() function and assigns the asynchronous results to an object. Finally, we call the .get() method on each asynchronous result to wait for and retrieve the responses. Although there's quite a bit of extra code here, batching in parallel can provide huge speed boosts.

6. Let's practice!

Time to begin batching!