Setting up the reward trainer
Your project continues and you now have the model
and config
objects ready to start training the reward model.
The training and evaluation datasets have been preloaded as train_data
and eval_data
. The RewardTrainer
has been imported from trl
.
Este ejercicio forma parte del curso
Reinforcement Learning from Human Feedback (RLHF)
Instrucciones del ejercicio
- Initialize the
RewardTrainer()
by assigning the model, tokenizer, training dataset, evaluation dataset, and reward configuration to its attributes.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
tokenizer = AutoTokenizer.from_pretrained("openai-gpt")
model = AutoModelForSequenceClassification.from_pretrained('openai-gpt')
config = RewardConfig(output_dir='output_dir', max_length=60)
# Initialize the reward trainer
trainer = ____