Reinforcement learning from human feedback

View the content before continuing.

Este exercicio faz parte do curso

Google DeepMind: Fine-Tune Your Model

Ver curso

exercicio interativo prático

Transforme teoria em prática com um dos nossos exercicio interativos

Iniciar exercicio

Google DeepMind: Fine-Tune Your Model

IntermediárioNível de habilidade

4.8+

17 reviews

In this module, you will appreciate the power and importance of formatting for fine-tuning large language models. You will explore the various formats required to achieve different tasks. You will pre-process a dataset derived from the Africa Galore dataset and transform it into a question-and-answer format. This will then be used in later modules to fine-tune your language model so that it can generate revision study flashcards—a task which it was not pre-trained to do.

Exercise 1: Formatting Exercise 2: Lab: Format Text for Turn-Based Dialogue Exercise 3: Input and output formats Exercise 4: Quiz 1 - Question 1 Exercise 5: Quiz 1 - Question 2

In this module, you will learn full-parameter fine-tuning: a straightforward method for adapting pre-trained models. To gain an understanding, you will first take the small language model you built in course 04 Discover The Transformer Architecture. Then you will continue its training on a small, specialized dataset to generate the revision study flashcards you created in the previous module. This process will allow you to compare fine-tuning to training from scratch, observing the key similarities and differences in the development pipeline. Here, you will also consider how AI is made sense of within cultural contexts by reading a story about AI and then creating your own short piece of fiction. The aim here is to explore how narrative can act as a complementary approach to anticipation and reflection for revealing cultural meaning and values.

Exercise 1: The advantages of fine-tuning Exercise 2: Full-parameter fine-tuning Exercise 3: Lab: Fine-Tune All The Parameters of Your SLM Exercise 4: When to stop fine-tuning Exercise 5: What makes a "good" flashcard?Exercise 6: Imagining AI in cultural contexts Exercise 7: Write your own piece of AI fiction Exercise 8: Quiz 2 - Question 1 Exercise 9: Quiz 2 - Question 2

In this module, you will explore low-rank adaptation (LoRA), a more computationally efficient alternative to full-parameter fine-tuning. LoRA is a popular parameter-efficient fine-tuning (PEFT) technique. You will investigate LoRA by applying it to fine-tune the Gemma3-1B model, which has one billion parameters. This will enable you to experience first-hand how it is able to achieve excellent results with a fraction of the computational cost of full-parameter fine-tuning.

Exercise 1: Foundation models Exercise 2: Lab: Full-Parameter Fine-Tuning of Gemma Exercise 3: Low-rank adaptation (LoRA)Exercise 4: Lab: Implement LoRA for Parameter-Efficient Fine-Tuning Exercise 5: Lab: Fine-Tuning Gemma3-1B with LoRA Exercise 6: Quiz 3 - Question 1 Exercise 7: Quiz 3 - Question 2

In this module, you will consider the limitations of supervised fine-tuning. You will then be given a brief overview of advanced techniques based on reinforcement learning (RL). This will introduce you to how these approaches can better align a model's behavior with human values and preferences.

Exercise 1: Opportunities and limitations of SFT Exercise 2: Reinforcement learning from human feedback

Exercicio Atual

Exercise 3: Quiz 4 - Question 1 Exercise 4: Quiz 4 - Question 2

In this module, you will explore foresight and governance. You will consider how the values and meanings revealed through storytelling can inform foresight reporting and help you to design governance responses, including enforceable rules, transparency, and accountability. This equips you to think about how strong governance can protect communities, ensure equity, and align AI with societal values.

Exercise 1: Foresight and governance Exercise 2: Designing a governance blueprint for your LLM Exercise 3: Quiz 5 - Question 1 Exercise 4: Quiz 5 - Question 2