Aan de slagGa gratis aan de slag

Checking the reward model

You go back to fine-tuning the model and notice that the model's performance is still worse compared to the base model. This time, you want to inspect the reward model, and you've produced a dataset with a set of results from the model that you're going to analyze. What checks will you make on the output data?

The dataset has been pre-imported as reward_model_results.

Deze oefening maakt deel uit van de cursus

Reinforcement Learning from Human Feedback (RLHF)

Cursus bekijken

Praktische interactieve oefening

Zet theorie om in actie met een van onze interactieve oefeningen.

Begin met trainen