Building a retrieval function

A key process in the Retrieval Augmented Generation (RAG) workflow is retrieving data from the database. In this exercise, you'll design a custom function called retrieve() that will perform this crucial process in the final exercise of the course.

Este exercício faz parte do curso

Vector Databases for Embeddings with Pinecone

Instruções do exercício

Initialize the Pinecone client with your API key (the OpenAI client is available as client).
Define the function retrieve that takes four parameters: query, top_k, namespace, and emb_model.
Embed the input query using the emb_model argument.
Retrieve the top_k similar vectors to query_emb with metadata, specifying the namespace provided to the function as an argument.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Initialize the Pinecone client
pc = Pinecone(api_key="____")
index = pc.Index('pinecone-datacamp')

# Define a retrieve function that takes four arguments: query, top_k, namespace, and emb_model
def retrieve(query, top_k, namespace, emb_model):
    # Encode the input query using OpenAI
    query_response = ____(
        input=____,
        model=____
    )
    
    query_emb = query_response.data[0].embedding
    
    # Query the index using the query_emb
    docs = index.query(vector=____, top_k=____, namespace=____, include_metadata=True)
    
    retrieved_docs = []
    sources = []
    for doc in docs['matches']:
        retrieved_docs.append(doc['metadata']['text'])
        sources.append((doc['metadata']['title'], doc['metadata']['url']))
    
    return retrieved_docs, sources

documents, sources = retrieve(
  query="How to build next-level Q&A with OpenAI",
  top_k=3,
  namespace='youtube_rag_dataset',
  emb_model="text-embedding-3-small"
)
print(documents)
print(sources)

Editar e executar o código

Este exercício faz parte do curso

Vector Databases for Embeddings with Pinecone

IntermediárioNível de habilidade

4.8+

Iniciar curso de graça

Explore the mechanics behind Pinecone's vector database, from pods and indexes to comparing it with other databases. Learn to differentiate pod types, acquire API keys, and initialise Pinecone connection using python. Finally, you’ll learn how to create Pinecone indexes, exploring different parameters such as dimensionality, distance metrics, pod types, and others.

Exercise 1: Introduction to Pinecone indexes Exercise 2: Creating a Pinecone client Exercise 3: Your first Pinecone index Exercise 4: Managing indexes Exercise 5: Connecting to an index Exercise 6: Deleting an index Exercise 7: The Pinecone ecosystem Exercise 8: Vector ingestion Exercise 9: Checking dimensionality Exercise 10: Ingesting vectors with metadata

Get hands-on with Pinecone in Python, where we explore the practical side of using Pinecone for managing indexes, adding vectors with metadata, searching and retrieving vectors, and making updates or deletions. Gain a solid grasp of the key functions and ideas to smoothly handle data in the Pinecone vector database.

Exercise 1: Retrieving vectors Exercise 2: Querying vs. fetching Exercise 3: Fetching vectors Exercise 4: Querying vectors Exercise 5: Returning the most similar vectors Exercise 6: Changing distance metrics Exercise 7: Metadata filtering Exercise 8: Filtering queries Exercise 9: Multiple metadata filters Exercise 10: Updating and deleting vectors Exercise 11: Updating vector values Exercise 12: Updating vector metadata Exercise 13: Deleting vectors

In this chapter, learners delve into optimizing Pinecone index performance, leveraging multi-tenant namespaces for cost reduction, building semantic search engines, and creating retrieval-augmented question answering systems using Pinecone with the OpenAI API. Through these lessons, learners gain practical skills in performance tuning, semantic search, and retrieval-augmented question answering, empowering them to apply Pinecone effectively in real-world AI applications.

Exercise 1: Batching upserts Exercise 2: Defining a function for chunking Exercise 3: Batching upserts in chunks Exercise 4: Batching upserts in parallel Exercise 5: Multitenancy and namespaces Exercise 6: Namespaces Exercise 7: Querying namespaces Exercise 8: Semantic search with Pinecone Exercise 9: Creating and configuring a Pinecone index Exercise 10: Upserting vectors for semantic search Exercise 11: Querying vectors for semantic search Exercise 12: RAG chatbot with Pinecone and OpenAI Exercise 13: Upserting YouTube transcripts Exercise 14: Building a retrieval function

Exercício atual

Exercise 15: RAG questions answering function Exercise 16: Congratulations!