Aan de slagGa gratis aan de slag

Train with LoRA

You wanted to begin RLHF fine-tuning but kept encountering out-of-memory errors. Although you switched to loading the model in 8-bit precision, the error persisted. To address this, you decided to take the next step and apply LoRA for more efficient fine-tuning.

The following have already been pre-imported:

  • The model loaded in 8-bit precision as pretrained_model_8bit
  • LoraConfig and get_peft_model from peft
  • AutoModelForCausalLMWithValueHead from trl

Deze oefening maakt deel uit van de cursus

Reinforcement Learning from Human Feedback (RLHF)

Cursus bekijken

Oefeninstructies

  • Set the LoRA droupout to 0.1 and the bias type to be lora-only.
  • Add the LoRA configuration to the model.
  • Set up the model with a value head for PPO training.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Set the configuration parameters
config = LoraConfig(
    r=32,  
    lora_alpha=32,  
    lora_dropout=____,  
    bias=____)  

# Apply the LoRA configuration to the 8-bit model
lora_model = get_peft_model(pretrained_model_8bit, ____)
# Set up the tokenizer and model with a value head for PPO training
model = ____.from_pretrained(____)
Code bewerken en uitvoeren