1. Learn
  2. /
  3. Courses
  4. /
  5. Reinforcement Learning from Human Feedback (RLHF)

Connected

Exercise

Train with LoRA

You wanted to begin RLHF fine-tuning but kept encountering out-of-memory errors. Although you switched to loading the model in 8-bit precision, the error persisted. To address this, you decided to take the next step and apply LoRA for more efficient fine-tuning.

The following have already been pre-imported:

  • The model loaded in 8-bit precision as pretrained_model_8bit
  • LoraConfig and get_peft_model from peft
  • AutoModelForCausalLMWithValueHead from trl

Instructions

100 XP
  • Set the LoRA droupout to 0.1 and the bias type to be lora-only.
  • Add the LoRA configuration to the model.
  • Set up the model with a value head for PPO training.