1. Introduction to RLHF
Welcome to this course on reinforcement learning from human feedback, or RLHF!
2. Welcome to the course!
My name is Mina Parham, and I'm an AI Engineer applying large language models, or LLMs, to different domains and passionate about solving problems with reinforcement learning and RLHF.
Throughout this course, we'll explore how reinforcement learning can be enhanced by human feedback,
3. Welcome to the course!
and how it can be applied in systems such as language models to improve outputs.
4. Reinforcement learning review
Let's briefly review some essential concepts from Reinforcement Learning, or RL, and Deep Reinforcement Learning, or DRL.
In RL and DRL,
5. Reinforcement learning review
an agent performs
6. Reinforcement learning review
actions in an environment, and its behavior is defined by
7. Reinforcement learning review
a policy that maximizes a reward, aligning the agent's behavior with desired outcomes over time.
Different policies can be used, and in deep reinforcement learning, neural networks are used to generate the policies.
8. From RL to RLHF
In RLHF, the agent, which is a language model,
9. From RL to RLHF
operates in an environment defined by the prompts and the output text it generates.
10. From RL to RLHF
The policy, which dictates its behavior, is derived from human feedback. This process produces what is called a 'reward model', and helps align model output with human values. This is helpful in contexts when specific knowledge is needed, or multiple valid responses exist.
11. LLM fine-tuning in RLHF
But the RLHF process includes a few more steps: to start with,
12. LLM fine-tuning in RLHF
the initial LLM needs to be fine-tuned to minimize the effort required by human evaluators.
13. The full RLHF process
The initial LLM can then be used to get started with RLHF.
Imagine we need a model to answer questions about English authors.
14. The full RLHF process
Our initial model, when asked who wrote Romeo and Juliet, gives a vague answer.
15. The full RLHF process
So, we fine-tune another model, called the policy model,
16. The full RLHF process
in a feedback loop passing the results to the reward model that has been trained with human feedback.
17. The full RLHF process
After a few iterations, we get the response we're looking for: Shakespeare, which is a more precise answer than the one from our initial LLM.
18. The full RLHF process
Finally, we still run a check as a comparison between the two outputs, to ensure that, while the model improves, it doesn't change too drastically.
19. Interacting with RLHF-tuned LLMs
The good news is that there are RLHF-trained models we can use that are readily available. You can find them on Hugging Face, and you can read their model card and readme file to check if they've been trained using RLHF. In this example, we'll generate new data to populate a social media sentiment dataset, and then classify the results using a language model. We use the Transformers library with pipeline to load a model which has been pre-trained with RLHF. We then pass a sentence, or prompt, to the model to generate a continuation of the text for us. Finally, we can print the result.
20. Interacting with RLHF-tuned LLMs
We can use other models to classify the data that the RLHF-trained model generated. By running sentiment analysis, we can see how positive or negative the generated text is.
For this step, we'll use pipeline as well as AutoModelForSequenceClassification and AutoTokenizer to instantiate a pre-trained model and tokenizer.
Then we initialize a sentiment analyzer, to which we pass the generated data.
Finally, we run the prediction and get our label.
21. Let's practice!
Now, let's experiment with models fine-tuned using RLHF!