Building a retrieval function
A key process in the Retrieval Augmented Generation (RAG) workflow is retrieving data from the database. In this exercise, you'll design a custom function called retrieve()
that will perform this crucial process in the final exercise of the course.
Cet exercice fait partie du cours
Vector Databases for Embeddings with Pinecone
Instructions
- Initialize the Pinecone client with your API key (the OpenAI client is available as
client
). - Define the function
retrieve
that takes four parameters:query
,top_k
,namespace
, andemb_model
. - Embed the input
query
using theemb_model
argument. - Retrieve the
top_k
similar vectors toquery_emb
with metadata, specifying thenamespace
provided to the function as an argument.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Initialize the Pinecone client
pc = Pinecone(api_key="____")
index = pc.Index('pinecone-datacamp')
# Define a retrieve function that takes four arguments: query, top_k, namespace, and emb_model
def retrieve(query, top_k, namespace, emb_model):
# Encode the input query using OpenAI
query_response = ____(
input=____,
model=____
)
query_emb = query_response.data[0].embedding
# Query the index using the query_emb
docs = index.query(vector=____, top_k=____, namespace=____, include_metadata=True)
retrieved_docs = []
sources = []
for doc in docs['matches']:
retrieved_docs.append(doc['metadata']['text'])
sources.append((doc['metadata']['title'], doc['metadata']['url']))
return retrieved_docs, sources
documents, sources = retrieve(
query="How to build next-level Q&A with OpenAI",
top_k=3,
namespace='youtube_rag_dataset',
emb_model="text-embedding-3-small"
)
print(documents)
print(sources)