Aan de slagGa gratis aan de slag

Building a retrieval function

A key process in the Retrieval Augmented Generation (RAG) workflow is retrieving data from the database. In this exercise, you'll design a custom function called retrieve() that will perform this crucial process in the final exercise of the course.

Deze oefening maakt deel uit van de cursus

Vector Databases for Embeddings with Pinecone

Cursus bekijken

Oefeninstructies

  • Initialize the Pinecone client with your API key (the OpenAI client is available as client).
  • Define the function retrieve that takes four parameters: query, top_k, namespace, and emb_model.
  • Embed the input query using the emb_model argument.
  • Retrieve the top_k similar vectors to query_emb with metadata, specifying the namespace provided to the function as an argument.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Initialize the Pinecone client
pc = Pinecone(api_key="____")
index = pc.Index('pinecone-datacamp')

# Define a retrieve function that takes four arguments: query, top_k, namespace, and emb_model
def retrieve(query, top_k, namespace, emb_model):
    # Encode the input query using OpenAI
    query_response = ____(
        input=____,
        model=____
    )
    
    query_emb = query_response.data[0].embedding
    
    # Query the index using the query_emb
    docs = index.query(vector=____, top_k=____, namespace=____, include_metadata=True)
    
    retrieved_docs = []
    sources = []
    for doc in docs['matches']:
        retrieved_docs.append(doc['metadata']['text'])
        sources.append((doc['metadata']['title'], doc['metadata']['url']))
    
    return retrieved_docs, sources

documents, sources = retrieve(
  query="How to build next-level Q&A with OpenAI",
  top_k=3,
  namespace='youtube_rag_dataset',
  emb_model="text-embedding-3-small"
)
print(documents)
print(sources)
Code bewerken en uitvoeren