This chapter introduces the basics of Reinforcement Learning with Human Feedback (RLHF), a technique that uses human input to help AI models learn more effectively. Get started with RLHF by understanding how it differs from traditional reinforcement learning and why human feedback can enhance AI performance in various domains.

Introduction to RLHF

Text generation with RLHF

Classifying generated text for RLHF

RL vs. RLHF

Exploring pre-trained LLMs

Tokenize a text dataset

Fine-tuning for review classification

Preparing data for RLHF

Preparing the preference dataset

Extracting prompts

Foundational Concepts

Discover how to set up systems for gathering human feedback in this Chapter. Learn best practices for collecting high-quality data, from pairwise comparisons to uncertainty sampling,  and explore strategies for enhancing your data collection. 

Methods for high-quality feedback gathering

Understanding comparison and rating in RLHF

Comparing slogans for a gym campaign

Measuring feedback quality and relevance

Low confidence

K-means for feedback clustering

Active learning

Implementing an active learning pipeline

Active learning loop

Gathering Human Feedback

In this Chapter, you'll get into the core of Reinforcement Learning from Human Feedback training. This includes exploring fine-tuning with PPO, techniques to train efficiently, and handling potential divergences from your metrics' objectives. 

Esplorare i modelli di ricompensa

Inizializzare il reward

Configurare il reward trainer

Training con PPO

Inizializza il trainer PPO

Fine-tuning con PPO

Ottimizzazione efficiente del fine-tuning in RLHF

Preparazione al training a 8 bit

Addestra con LoRA

 Tuning Models with Human Feedback

Explore key techniques for assessing and improving model performance in this last Chapter of Reinforcement Learning from Human Feedback (RLHF): from fine-tuning metrics to incorporating diverse feedback sources, you'll be provided with a comprehensive toolkit to refine your models effectively.

Model metrics and adjustments

Mitigating negative KL divergence

Checking the reward model

Incorporating diverse feedback sources

Majority voting on multiple data sources

Unreliable data source identification

Evaluating RLHF models

Interpreting curves

Evaluating RLHF with metrics

Wrapping up your RLHF journey

Model Evaluation

In questo corso su Reinforcement Learning from Human Feedback, unirai l’efficienza della Generative AI alla competenza umana. Imparerai a far sì che i modelli GenAI rispecchino davvero valori e preferenze umane, facendo pratica diretta con gli LLM. Inoltre, affronterai le complessità dei modelli di ricompensa e scoprirai come sviluppare gli LLM per creare un’AI che non solo apprende, ma si adatta a scenari reali.

Deep Reinforcement Learning in Python

Impara a creare modelli GenAI che riflettano i valori umani acquisendo esperienza pratica con LLM avanzati.

Reinforcement Learning from Human Feedback (RLHF)

Scopri come fare in modo che i modelli GenAI riflettano davvero i valori umani e allo stesso tempo fai pratica con gli LLM avanzati.

Sviluppare modelli linguistici di grandi dimensioni

Apprendimento per rinforzo in Python

Configurare il reward trainer

Reinforcement Learning from Human Feedback (RLHF)

Istruzioni dell'esercizio

Esercizio pratico interattivo