IniziaInizia gratis

Checking the reward model

You go back to fine-tuning the model and notice that the model's performance is still worse compared to the base model. This time, you want to inspect the reward model, and you've produced a dataset with a set of results from the model that you're going to analyze. What checks will you make on the output data?

The dataset has been pre-imported as reward_model_results.

Questo esercizio fa parte del corso

Reinforcement Learning from Human Feedback (RLHF)

Visualizza il corso

Esercizio pratico interattivo

Passa dalla teoria alla pratica con uno dei nostri esercizi interattivi

Inizia esercizio