Preparazione al training a 8 bit

Volevi iniziare il fine-tuning RLHF, ma continuavi a incontrare errori di memoria insufficiente. Per risolvere, hai deciso di passare alla precisione a 8 bit, che consente un fine-tuning più efficiente, sfruttando la libreria peft di Hugging Face.

Sono già stati importati:

AutoModelForCausalLM da transformers
prepare_model_for_int8_training da peft
AutoModelForCausalLMWithValueHead da trl

Questo esercizio fa parte del corso

Reinforcement Learning from Human Feedback (RLHF)

Visualizza il corso

Istruzioni dell'esercizio

Carica il modello pre-addestrato e assicurati di includere il parametro per la precisione a 8 bit.
Usa la funzione prepare_model_for_int8_training per preparare il modello al fine-tuning basato su LoRA.
Carica il modello con una value head per il training con PPO.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

model_name = "gpt2"  

# Load the model in 8-bit precision
pretrained_model = AutoModelForCausalLM.from_pretrained(
                                                       model_name, 
                                                       ____=True
                                                      )

# Prepare the model for fine-tuning
pretrained_model_8bit = ____(pretrained_model)

# Load the model with a value head
model = ____.from_pretrained(pretrained_model_8bit)

Modifica ed esegui il codice

Questo esercizio fa parte del corso

Reinforcement Learning from Human Feedback (RLHF)

AvançadoNível de habilidade

4.8+

Inizia il corso gratis

This chapter introduces the basics of Reinforcement Learning with Human Feedback (RLHF), a technique that uses human input to help AI models learn more effectively. Get started with RLHF by understanding how it differs from traditional reinforcement learning and why human feedback can enhance AI performance in various domains.

Exercise 1: Introduction to RLHF Exercise 2: Text generation with RLHF Exercise 3: Classifying generated text for RLHF Exercise 4: RL vs. RLHF Exercise 5: Exploring pre-trained LLMs Exercise 6: Tokenize a text dataset Exercise 7: Fine-tuning for review classification Exercise 8: Preparing data for RLHF Exercise 9: Preparing the preference dataset Exercise 10: Extracting prompts

Discover how to set up systems for gathering human feedback in this Chapter. Learn best practices for collecting high-quality data, from pairwise comparisons to uncertainty sampling, and explore strategies for enhancing your data collection.

Exercise 1: Methods for high-quality feedback gathering Exercise 2: Understanding comparison and rating in RLHF Exercise 3: Comparing slogans for a gym campaign Exercise 4: Measuring feedback quality and relevance Exercise 5: Low confidence Exercise 6: K-means for feedback clustering Exercise 7: Active learning Exercise 8: Implementing an active learning pipeline Exercise 9: Active learning loop

In this Chapter, you'll get into the core of Reinforcement Learning from Human Feedback training. This includes exploring fine-tuning with PPO, techniques to train efficiently, and handling potential divergences from your metrics' objectives.

Exercise 1: Esplorare i modelli di ricompensa Exercise 2: Inizializzare il reward Exercise 3: Configurare il reward trainer Exercise 4: Training con PPO Exercise 5: Inizializza il trainer PPO Exercise 6: Fine-tuning con PPO Exercise 7: Ottimizzazione efficiente del fine-tuning in RLHF Exercise 8: Preparazione al training a 8 bit

Esercizio in corso

Exercise 9: Addestra con LoRA

Explore key techniques for assessing and improving model performance in this last Chapter of Reinforcement Learning from Human Feedback (RLHF): from fine-tuning metrics to incorporating diverse feedback sources, you'll be provided with a comprehensive toolkit to refine your models effectively.

Exercise 1: Model metrics and adjustments Exercise 2: Mitigating negative KL divergence Exercise 3: Checking the reward model Exercise 4: Incorporating diverse feedback sources Exercise 5: Majority voting on multiple data sources Exercise 6: Unreliable data source identification Exercise 7: Evaluating RLHF models Exercise 8: Interpreting curves Exercise 9: Evaluating RLHF with metrics Exercise 10: Wrapping up your RLHF journey