RAG versus fine-tuning

1. RAG versus fine-tuning

During development, we often need to incorporate proprietary data and external knowledge.

2. LLM lifecyle: RAG versus fine-tuning

Let's compare two methodologies: Retrieval Augmented Generation, or RAG, and fine-tuning.

3. Retrieval Augmented Generation (RAG)

RAG is a common LLM design pattern, combining the model's reasoning abilities with external factual knowledge. RAG consists of three steps in a chain: Retrieve related documents, augment the prompt with these documents, and generate the output. This allows the LLM to use external data for better results. The retrieve step is crucial, as dealing with large knowledge bases can be challenging. Thankfully, there are ready-made solutions, often implemented using vector databases.

4. RAG-chain with vector database

Let's create a RAG-chain with vector databases. First, we convert the input into a numerical representation called an embedding, which captures its meaning. Similar meanings yield similar embeddings. Embeddings are created using pre-trained models. Next, we search our vector database containing all embeddings. We compare the input embedding with those of the documents and calculate their similarity. Finally, we retrieve the most similar documents.

5. RAG-chain with vector database

The augment step combines the input with these documents to create the final prompt.

6. RAG-chain with vector database

The generate step uses this prompt to create an output. When implementing RAG, we face several implementation choices, especially regarding the embedding model. Open-source and proprietary options vary in quality, cost, and ease-of-use. Certain models work well with particular text or languages, but not others, making experimentation and testing crucial. Let's now explore fine-tuning as another method to incorporate external information.

7. Fine-tuning

Unlike RAG, which incorporates factual knowledge, fine-tuning adjusts the LLM's weights using our own data. This process expands the model's reasoning capabilities to specific tasks and new domains, such as different languages or specialized fields.

8. Fine-tuning

There are two main approaches to fine-tuning, each requiring different types of data. Supervised fine-tuning, a form of transfer learning, needs demonstration data containing input prompts with desired outputs. This approach retrains parts of the model using this new data. The second approach is reinforcement learning from human feedback, typically done after supervised fine-tuning. This requires human-labeled data, such as rankings, or quality scores obtained from likes and dislikes. We then train an extra reward model to predict output quality, and optimize the original LLM to maximize this. Both approaches require decision-making and can be complex.

9. RAG or fine-tuning

So when should we use RAG or fine-tuning? RAG is used to include factual knowledge. It retains all capabilities of the original LLM without altering it. It's easy to implement and keeps the application up-to-date if the external database is current. However, it adds extra components to the application, requiring careful engineering. When specializing in a new domain, fine-tuning is used, as it offers full customizability over the model without needing additional components during deployment. Yet, it needs labeled data and specialized knowledge to implement. Depending on the training data used, it may worsen the application and amplify data biases. It can even cause the model to forget previously learned knowledge, known as catastrophic forgetting.

10. The development cycle

Let's return to the development cycle.

11. The development cycle

With the addition of external databases, we utilize RAG to use this data in our application.

12. The development cycle

The activity of fine-tuning allows us to tailor the LLM to new domains.

13. Let's practice!

Let's review RAG and fine-tuning.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.