Get startedGet started for free

RAG chatbot with Pinecone and OpenAI

1. RAG chatbot with Pinecone and OpenAI

Welcome back! In this video, we'll implement a question-answering chatbot using Pinecone and OpenAI.

2. Retrieval Augmented Generation (RAG)

Crucial to this is a process called Retrieval-Augmented Generation, or RAG. It's a system architecture designed to improve question-answering models by providing them with additional information. In RAG, a user query is embedded and used to retrieve the most relevant documents from the database. Then, these documents are added to the model's prompt so that the model has extra context to inform its response.

3. Initialize Pinecone and OpenAI

We start by importing the necessary libraries, initialize the Pinecone and OpenAI clients, and connect to our index.

4. YouTube transcripts

We'll work with a dataset containing YouTube transcripts. It includes columns such as text ID, channel ID, video title, text, and more. Later, we'll save all this information as metadata.

5. Ingesting documents

We'll upsert this dataset in batch sizes of 100, and split the dataset using NumPy's array_split() function. With each batch, we'll extract the metadata using a list comprehension. This creates a metadata dictionary for each row in the batch, assigning the row values to dictionary keys. We also extract the article text to embed and create unique IDs for each row using the uuid4() function. Next, we create a request to OpenAI to embed these texts, extracting the raw embeddings from the response. Finally, we zip together the ids, vectors, and metadatas and upsert them to the youtube_rag_dataset namespace.

6. Retrieval function

Now that our documents are ingested, let's implement a function to retrieve relevant documents from a Pinecone index based on a query. It uses the same OpenAI embeddings function to encode the user query, extracting the embeddings from the API response. The query embeddings are used to query our 'youtube_rag_dataset' namespace to retrieve the most relevant documents. For each retrieved document, we extract the text and source metadata, appending them to the retrieved_docs and sources lists, respectively. Finally, it returns the lists of retrieved documents and their corresponding sources.

7. Retrieval output

Here's the function in action, retrieving the transcripts based on the user's query along with the video's title and URL.

8. Prompt with context builder function

Now that we have the top_k documents, we'll construct a prompt_with_context_builder() function to create a contextual prompt for our model. It glues together an instruction in prompt_start, the most relevant documents, which are joined together with the .join() method, and the user query in prompt_end.

9. Prompt with context builder output

Here's prompt_with_context_builder() in action. The final prompt includes the retrieved document followed by the user's query. This prompt will be used to generate the model response.

10. Question-answering function

Finally, let's create a question_answering() function. It takes a prompt string, document sources, and uses the OpenAI chat completion endpoint. We'll define two prompts: a system prompt, setting the model's behavior, and a user prompt, which will be the output from our prompt_with_context_builder() function. The function returns the final answer with document sources.

11. Question-answering output

Here's the function in action. Note there are three sources, as we retrieved the top three documents, or transcript segments, based on the user's query.

12. Putting it all together

Putting it all together, here's how all the functions work together from user query to the final generated answer.

13. Let's practice!

Time to give this a go!