Upserting vectors for semantic search

Time to embed some text data and upsert the vectors and metadata into your 'pinecone-datacamp' index! You've be given a dataset named squad_dataset.csv, and a sample of 200 rows has been loaded in the DataFrame, df.

In this exercise, to interact with the OpenAI API to use their embedding model, you don't need to create and use your own API key. A valid OpenAI client has been created for you and assigned to the client variable.

Your task is to embed the text using OpenAI's API and upsert the embeddings and metadata into the Pinecone index under the namespace, squad_dataset.

This exercise is part of the course

Vector Databases for Embeddings with Pinecone

Exercise instructions

Initialize the Pinecone client with your API key (the OpenAI client is already available as client).
Extract the 'id', 'text', and 'title' metadata from each row in the batch.
Encode texts using 'text-embedding-3-small' from OpenAI with dimensionality 1536.
Upsert the vectors and metadatas to a namespace called 'squad_dataset'.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Initialize the Pinecone client
pc = Pinecone(api_key="____")
index = pc.Index('pinecone-datacamp')

batch_limit = 100

for batch in np.array_split(df, len(df) / batch_limit):
    # Extract the metadata from each row
    metadatas = [{
      "text_id": row['____'],
      "text": row['____'],
      "title": row['____']} for _, row in batch.iterrows()]
    texts = batch['text'].tolist()
    
    ids = [str(uuid4()) for _ in range(len(texts))]
    
    # Encode texts using OpenAI
    response = ____(input=____, model="____")
    embeds = [np.array(x.embedding) for x in response.data]
    
    # Upsert vectors to the correct namespace
    ____(vectors=____(ids, embeds, metadatas), namespace=____)

Edit and Run Code

This exercise is part of the course

Vector Databases for Embeddings with Pinecone

IntermediateSkill Level

4.8+

Start Course for Free

Explore the mechanics behind Pinecone's vector database, from pods and indexes to comparing it with other databases. Learn to differentiate pod types, acquire API keys, and initialise Pinecone connection using python. Finally, you’ll learn how to create Pinecone indexes, exploring different parameters such as dimensionality, distance metrics, pod types, and others.

Exercise 1: Introduction to Pinecone indexes Exercise 2: Creating a Pinecone client Exercise 3: Your first Pinecone index Exercise 4: Managing indexes Exercise 5: Connecting to an index Exercise 6: Deleting an index Exercise 7: The Pinecone ecosystem Exercise 8: Vector ingestion Exercise 9: Checking dimensionality Exercise 10: Ingesting vectors with metadata

Get hands-on with Pinecone in Python, where we explore the practical side of using Pinecone for managing indexes, adding vectors with metadata, searching and retrieving vectors, and making updates or deletions. Gain a solid grasp of the key functions and ideas to smoothly handle data in the Pinecone vector database.

Exercise 1: Retrieving vectors Exercise 2: Querying vs. fetching Exercise 3: Fetching vectors Exercise 4: Querying vectors Exercise 5: Returning the most similar vectors Exercise 6: Changing distance metrics Exercise 7: Metadata filtering Exercise 8: Filtering queries Exercise 9: Multiple metadata filters Exercise 10: Updating and deleting vectors Exercise 11: Updating vector values Exercise 12: Updating vector metadata Exercise 13: Deleting vectors

In this chapter, learners delve into optimizing Pinecone index performance, leveraging multi-tenant namespaces for cost reduction, building semantic search engines, and creating retrieval-augmented question answering systems using Pinecone with the OpenAI API. Through these lessons, learners gain practical skills in performance tuning, semantic search, and retrieval-augmented question answering, empowering them to apply Pinecone effectively in real-world AI applications.

Exercise 1: Batching upserts Exercise 2: Defining a function for chunking Exercise 3: Batching upserts in chunks Exercise 4: Batching upserts in parallel Exercise 5: Multitenancy and namespaces Exercise 6: Namespaces Exercise 7: Querying namespaces Exercise 8: Semantic search with Pinecone Exercise 9: Creating and configuring a Pinecone index Exercise 10: Upserting vectors for semantic search

Current Exercise

Exercise 11: Querying vectors for semantic search Exercise 12: RAG chatbot with Pinecone and OpenAI Exercise 13: Upserting YouTube transcripts Exercise 14: Building a retrieval function Exercise 15: RAG questions answering function Exercise 16: Congratulations!