Mitigating negative KL divergence
You were fine-tuning the model using RLHF techniques and noticed that the model's performance has worsened compared to the base model. You suspect this is due to negative KL divergence, so you want to set the correct generation parameters to prevent this issue.
The tokenizer has been pre-imported.
Questo esercizio fa parte del corso
Reinforcement Learning from Human Feedback (RLHF)
Istruzioni dell'esercizio
- Set
top_kandmin_lengthto values that help avoid KL divergence.
Esercizio pratico interattivo
Prova a risolvere questo esercizio completando il codice di esempio.
generation_kwargs = {
# Set min length and top k parameters
____,
"top_p": 1.0,
"do_sample": True,
"pad_token_id": tokenizer.eos_token_id,
"max_new_tokens": 32}