1. Learn
  2. /
  3. Courses
  4. /
  5. Reinforcement Learning from Human Feedback (RLHF)

Connected

Exercise

Prepare for 8-bit Training

You wanted to begin RLHF fine-tuning, but you kept running into out-of-memory errors. To address this, you decided to switch to 8-bit precision, which allows for more efficient fine-tuning, by leveraging the Hugging Face peft library.

The following have been pre-imported:

  • AutoModelForCausalLM from transformers
  • prepare_model_for_int8_training from peft
  • AutoModelForCausalLMWithValueHead from trl

Instructions

100 XP
  • Load the pre-trained model and make sure to include the parameter for 8-bit precision.
  • Use the prepare_model_for_int8_training function to make the model ready for LoRA-based fine-tuning.
  • Load the model with a value head for PPO training.