Batching upserts in parallel
In this exercise, you'll practice ingesting vectors into the 'datacamp-index'
Pinecone index in parallel. You'll need to connect to the index, upsert vectors in batches asynchronously, and check the updated metrics of the 'datacamp-index'
index.
The chunks()
helper function you created earlier is still available to use:
def chunks(iterable, batch_size=100):
"""A helper function to break an iterable into chunks of size batch_size."""
it = iter(iterable)
chunk = tuple(itertools.islice(it, batch_size))
while chunk:
yield chunk
chunk = tuple(itertools.islice(it, batch_size))
This exercise is part of the course
Vector Databases for Embeddings with Pinecone
Exercise instructions
- Initialize the Pinecone client to allow 20 simultaneous requests.
- Upsert the vectors in
vectors
in batches of 200 vectors per request asynchronously, configuring20
simultaneous requests. - Print the updated metrics of the
'datacamp-index'
Pinecone index.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize the client
pc = Pinecone(api_key="____", ____)
index = pc.Index('datacamp-index')
# Upsert vectors in batches of 200 vectors
with pc.Index('datacamp-index', ____) as index:
async_results = [____(vectors=chunk, ____) for chunk in chunks(vectors, batch_size=____)]
[async_result.get() for async_result in async_results]
# Retrieve statistics of the connected Pinecone index
print(____)