Train with LoRA
You wanted to begin RLHF fine-tuning but kept encountering out-of-memory errors. Although you switched to loading the model in 8-bit precision, the error persisted. To address this, you decided to take the next step and apply LoRA for more efficient fine-tuning.
The following have already been pre-imported:
- The model loaded in 8-bit precision as
pretrained_model_8bit
LoraConfig
andget_peft_model
frompeft
AutoModelForCausalLMWithValueHead
fromtrl
This exercise is part of the course
Reinforcement Learning from Human Feedback (RLHF)
Exercise instructions
- Set the LoRA droupout to
0.1
and the bias type to be lora-only. - Add the LoRA configuration to the model.
- Set up the model with a value head for PPO training.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Set the configuration parameters
config = LoraConfig(
r=32,
lora_alpha=32,
lora_dropout=____,
bias=____)
# Apply the LoRA configuration to the 8-bit model
lora_model = get_peft_model(pretrained_model_8bit, ____)
# Set up the tokenizer and model with a value head for PPO training
model = ____.from_pretrained(____)