Introduction to RLHF

1. Introduction to RLHF

Welcome to this course on reinforcement learning from human feedback, or RLHF!

2. Welcome to the course!

My name is Mina Parham, and I'm an AI Engineer applying large language models, or LLMs, to different domains and passionate about solving problems with reinforcement learning and RLHF. Throughout this course, we'll explore how reinforcement learning can be enhanced by human feedback,

3. Welcome to the course!

and how it can be applied in systems such as language models to improve outputs.

4. Reinforcement learning review

Let's briefly review some essential concepts from Reinforcement Learning, or RL, and Deep Reinforcement Learning, or DRL. In RL and DRL,

5. Reinforcement learning review

an agent performs

6. Reinforcement learning review

actions in an environment, and its behavior is defined by

7. Reinforcement learning review

a policy that maximizes a reward, aligning the agent's behavior with desired outcomes over time. Different policies can be used, and in deep reinforcement learning, neural networks are used to generate the policies.

8. From RL to RLHF

In RLHF, the agent, which is a language model,

9. From RL to RLHF

operates in an environment defined by the prompts and the output text it generates.

10. From RL to RLHF

The policy, which dictates its behavior, is derived from human feedback. This process produces what is called a 'reward model', and helps align model output with human values. This is helpful in contexts when specific knowledge is needed, or multiple valid responses exist.

11. LLM fine-tuning in RLHF

But the RLHF process includes a few more steps: to start with,

12. LLM fine-tuning in RLHF

the initial LLM needs to be fine-tuned to minimize the effort required by human evaluators.

13. The full RLHF process

The initial LLM can then be used to get started with RLHF. Imagine we need a model to answer questions about English authors.

14. The full RLHF process

Our initial model, when asked who wrote Romeo and Juliet, gives a vague answer.

15. The full RLHF process

So, we fine-tune another model, called the policy model,

16. The full RLHF process

in a feedback loop passing the results to the reward model that has been trained with human feedback.

17. The full RLHF process

After a few iterations, we get the response we're looking for: Shakespeare, which is a more precise answer than the one from our initial LLM.

18. The full RLHF process

Finally, we still run a check as a comparison between the two outputs, to ensure that, while the model improves, it doesn't change too drastically.

19. Interacting with RLHF-tuned LLMs

The good news is that there are RLHF-trained models we can use that are readily available. You can find them on Hugging Face, and you can read their model card and readme file to check if they've been trained using RLHF. In this example, we'll generate new data to populate a social media sentiment dataset, and then classify the results using a language model. We use the Transformers library with pipeline to load a model which has been pre-trained with RLHF. We then pass a sentence, or prompt, to the model to generate a continuation of the text for us. Finally, we can print the result.

20. Interacting with RLHF-tuned LLMs

We can use other models to classify the data that the RLHF-trained model generated. By running sentiment analysis, we can see how positive or negative the generated text is. For this step, we'll use pipeline as well as AutoModelForSequenceClassification and AutoTokenizer to instantiate a pre-trained model and tokenizer. Then we initialize a sentiment analyzer, to which we pass the generated data. Finally, we run the prediction and get our label.

21. Let's practice!

Now, let's experiment with models fine-tuned using RLHF!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.