1. RAG storage and retrieval using vector databases
Welcome back! Now that we've covered document loading and splitting, we'll round-out the RAG workflow with learning about storing and retrieving this information using vector databases.
2. RAG development steps
We've now loaded documents and split them into chunks using an appropriate chunk_size and chunk_overlap. All that's left is to store them for retrieval.
3. What is a vector database and why do I need it?
We'll be using a vector database to store our documents and make them available for retrieval.
This requires embedding our text documents to create vectors that capture the semantic meaning of the text. Then, a user query can be embedded to retrieve the most similar documents from the database and insert them into the model prompt.
4. Which vector database should I use?
There are many vector databases available in LangChain.
When making the decision on which solution to choose, consider whether an open source solution is required, which may be the case if high customizability is required. Also, consider whether the data can be stored on off-premises on third-party servers - not all cases will permit this.
The amount of storage and latency of retrieving results is also a key consideration. Sometimes a lightweight in-memory database will be sufficient, but others will require something more powerful. In this course, we will use ChromaDB because it is lightweight and quick to set up.
5. Meet the documents...
We'll be storing documents containing guidelines for a company's marketing copy.
There's two guidelines: one around brand capitalization, and another on how to refer to users.
6. Setting up a Chroma vector database
Now that we've parsed the data, it's time to embed it.
We'll use an embedding model from OpenAI by instantiating the OpenAIEmbeddings class, passing in our openai_api_key.
To create a Chroma database from a set of documents, call the .from_documents() method on the Chroma class, passing the documents and embedding function to use.
We'd like to persist this database to disk for future use, so provide a path to the persist_directory argument.
Finally, to integrate the database with other LangChain components, we need to convert it into a retriever with the .as_retriever() method. Here, we specify that we want to perform a similarity search and return the top two most similar documents for each user query.
7. Building a prompt template
So the model know what to do, we'll construct a prompt template, which starts with the instruction: to review and fix the copy provided, insert the retrieved guidelines and copy to review, and an indication that the model should follow with a fixed version.
8. Chaining it all together!
To chain together our retriever, prompt_template, and LLM, we use LCEL in a similar way as before, using pipes to connect the three components. The only difference is that we create a dictionary that assigns the retrieved documents to guidelines, and assigns the copy to review to the RunnablePassthrough function, which acts as a placeholder to insert our input when we invoke the chain.
Printing the result, we can see the model fixed the two guideline breaches.
9. Let's practice!
Let's cement this with some exercises!