연습 문제

Checking the reward model

You go back to fine-tuning the model and notice that the model's performance is still worse compared to the base model. This time, you want to inspect the reward model, and you've produced a dataset with a set of results from the model that you're going to analyze. What checks will you make on the output data?

The dataset has been pre-imported as reward_model_results.

지침

50 XP

가능한 답변

Looking at extreme cases

Examining the dataset distribution

Normalizing the rewards

All of the above

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}연습 문제

지침

가능한 답변

연습 문제