This chapter introduces the basics of Reinforcement Learning with Human Feedback (RLHF), a technique that uses human input to help AI models learn more effectively. Get started with RLHF by understanding how it differs from traditional reinforcement learning and why human feedback can enhance AI performance in various domains.

Introduction to RLHF

Text generation with RLHF

Classifying generated text for RLHF

RL vs. RLHF

Exploring pre-trained LLMs

Tokenize a text dataset

Fine-tuning for review classification

Preparing data for RLHF

Preparing the preference dataset

Extracting prompts

Foundational Concepts

Discover how to set up systems for gathering human feedback in this Chapter. Learn best practices for collecting high-quality data, from pairwise comparisons to uncertainty sampling,  and explore strategies for enhancing your data collection. 

Methods for high-quality feedback gathering

Understanding comparison and rating in RLHF

Comparing slogans for a gym campaign

Measuring feedback quality and relevance

Low confidence

K-means for feedback clustering

Active learning

Implementing an active learning pipeline

Active learning loop

Gathering Human Feedback

In this Chapter, you'll get into the core of Reinforcement Learning from Human Feedback training. This includes exploring fine-tuning with PPO, techniques to train efficiently, and handling potential divergences from your metrics' objectives. 

Menjelajahi reward model

Menginisialisasi reward

Menyiapkan reward trainer

Pelatihan dengan PPO

Inisialisasi pelatih PPO

Penyetelan halus dengan PPO

Fine-tuning yang efisien dalam RLHF

Siapkan untuk Pelatihan 8-bit

Latih dengan LoRA

 Tuning Models with Human Feedback

Explore key techniques for assessing and improving model performance in this last Chapter of Reinforcement Learning from Human Feedback (RLHF): from fine-tuning metrics to incorporating diverse feedback sources, you'll be provided with a comprehensive toolkit to refine your models effectively.

Model metrics and adjustments

Mitigating negative KL divergence

Checking the reward model

Incorporating diverse feedback sources

Majority voting on multiple data sources

Unreliable data source identification

Evaluating RLHF models

Interpreting curves

Evaluating RLHF with metrics

Wrapping up your RLHF journey

Model Evaluation

Padukan efisiensi Generative AI dengan pemahaman keahlian manusia dalam kursus Reinforcement Learning from Human Feedback ini. Anda akan mempelajari cara membuat model GenAI benar-benar mencerminkan nilai dan preferensi manusia sekaligus mendapatkan pengalaman langsung dengan LLM. Anda juga akan menavigasi kompleksitas model penghargaan dan mempelajari cara membangun di atas LLM untuk menghasilkan AI yang tidak hanya belajar, tetapi juga beradaptasi dengan skenario dunia nyata.

Deep Reinforcement Learning in Python

Pelajari cara membuat model GenAI mencerminkan nilai manusia sambil berlatih dengan LLM canggih.

Reinforcement Learning from Human Feedback (RLHF)

Pengembangan Model Bahasa Besar

Pembelajaran Penguatan dalam Python

Menyiapkan reward trainer

Reinforcement Learning from Human Feedback (RLHF)

Petunjuk latihan

Latihan interaktif praktis