LoslegenKostenlos loslegen

Upserting vectors for semantic search

Time to embed some text data and upsert the vectors and metadata into your 'pinecone-datacamp' index! You've be given a dataset named squad_dataset.csv, and a sample of 200 rows has been loaded in the DataFrame, df.

In this exercise, to interact with the OpenAI API to use their embedding model, you don't need to create and use your own API key. A valid OpenAI client has been created for you and assigned to the client variable.

Your task is to embed the text using OpenAI's API and upsert the embeddings and metadata into the Pinecone index under the namespace, squad_dataset.

Diese Übung ist Teil des Kurses

Vector Databases for Embeddings with Pinecone

Kurs anzeigen

Anleitung zur Übung

  • Initialize the Pinecone client with your API key (the OpenAI client is already available as client).
  • Extract the 'id', 'text', and 'title' metadata from each row in the batch.
  • Encode texts using 'text-embedding-3-small' from OpenAI with dimensionality 1536.
  • Upsert the vectors and metadatas to a namespace called 'squad_dataset'.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Initialize the Pinecone client
pc = Pinecone(api_key="____")
index = pc.Index('pinecone-datacamp')

batch_limit = 100

for batch in np.array_split(df, len(df) / batch_limit):
    # Extract the metadata from each row
    metadatas = [{
      "text_id": row['____'],
      "text": row['____'],
      "title": row['____']} for _, row in batch.iterrows()]
    texts = batch['text'].tolist()
    
    ids = [str(uuid4()) for _ in range(len(texts))]
    
    # Encode texts using OpenAI
    response = ____(input=____, model="____")
    embeds = [np.array(x.embedding) for x in response.data]
    
    # Upsert vectors to the correct namespace
    ____(vectors=____(ids, embeds, metadatas), namespace=____)
Code bearbeiten und ausführen