ComenzarEmpieza gratis

Setting up the reward trainer

Your project continues and you now have the model and config objects ready to start training the reward model.

The training and evaluation datasets have been preloaded as train_data and eval_data. The RewardTrainer has been imported from trl.

Este ejercicio forma parte del curso

Reinforcement Learning from Human Feedback (RLHF)

Ver curso

Instrucciones del ejercicio

  • Initialize the RewardTrainer() by assigning the model, tokenizer, training dataset, evaluation dataset, and reward configuration to its attributes.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

tokenizer = AutoTokenizer.from_pretrained("openai-gpt")
model = AutoModelForSequenceClassification.from_pretrained('openai-gpt')
config = RewardConfig(output_dir='output_dir', max_length=60)

# Initialize the reward trainer
trainer = ____
Editar y ejecutar código