1. Learn
  2. /
  3. Courses
  4. /
  5. Reinforcement Learning from Human Feedback (RLHF)

Connected

Exercise

Text generation with RLHF

In this exercise, you will work with a model pre-trained with RLHF named lvwerra/gpt2-imdb-pos-v2. This exercise is a chance to review constructing a Hugging Face pipeline and use it to test a use case for RLHF-trained models: generating movie reviews.

The pipeline, AutoModelForCausalLM, and AutoTokenizer objects have been pre-imported from transformers. The tokenizer has been pre-loaded

Instructions

100 XP
  • Set the model name to lvwerra/gpt2-imdb-pos-v2, the RLHF-pretrained model.
  • Use the pipeline function to create a text-generation pipeline.
  • Use the text generation pipeline to generate a continuation of the review provided.