Building a retrieval function
A key process in the Retrieval Augmented Generation (RAG) workflow is retrieving data from the database. In this exercise, you'll design a custom function called retrieve()
that will perform this crucial process in the final exercise of the course.
Diese Übung ist Teil des Kurses
Vector Databases for Embeddings with Pinecone
Anleitung zur Übung
- Initialize the Pinecone client with your API key (the OpenAI client is available as
client
). - Define the function
retrieve
that takes four parameters:query
,top_k
,namespace
, andemb_model
. - Embed the input
query
using theemb_model
argument. - Retrieve the
top_k
similar vectors toquery_emb
with metadata, specifying thenamespace
provided to the function as an argument.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Initialize the Pinecone client
pc = Pinecone(api_key="____")
index = pc.Index('pinecone-datacamp')
# Define a retrieve function that takes four arguments: query, top_k, namespace, and emb_model
def retrieve(query, top_k, namespace, emb_model):
# Encode the input query using OpenAI
query_response = ____(
input=____,
model=____
)
query_emb = query_response.data[0].embedding
# Query the index using the query_emb
docs = index.query(vector=____, top_k=____, namespace=____, include_metadata=True)
retrieved_docs = []
sources = []
for doc in docs['matches']:
retrieved_docs.append(doc['metadata']['text'])
sources.append((doc['metadata']['title'], doc['metadata']['url']))
return retrieved_docs, sources
documents, sources = retrieve(
query="How to build next-level Q&A with OpenAI",
top_k=3,
namespace='youtube_rag_dataset',
emb_model="text-embedding-3-small"
)
print(documents)
print(sources)