Get startedGet started for free

Train with LoRA

You wanted to begin RLHF fine-tuning but kept encountering out-of-memory errors. Although you switched to loading the model in 8-bit precision, the error persisted. To address this, you decided to take the next step and apply LoRA for more efficient fine-tuning.

The following have already been pre-imported:

  • The model loaded in 8-bit precision as pretrained_model_8bit
  • LoraConfig and get_peft_model from peft
  • AutoModelForCausalLMWithValueHead from trl

This exercise is part of the course

Reinforcement Learning from Human Feedback (RLHF)

View Course

Exercise instructions

  • Set the LoRA droupout to 0.1 and the bias type to be lora-only.
  • Add the LoRA configuration to the model.
  • Set up the model with a value head for PPO training.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Set the configuration parameters
config = LoraConfig(
    r=32,  
    lora_alpha=32,  
    lora_dropout=____,  
    bias=____)  

# Apply the LoRA configuration to the 8-bit model
lora_model = get_peft_model(pretrained_model_8bit, ____)
# Set up the tokenizer and model with a value head for PPO training
model = ____.from_pretrained(____)
Edit and Run Code