Mitigating negative KL divergence
You were fine-tuning the model using RLHF techniques and noticed that the model's performance has worsened compared to the base model. You suspect this is due to negative KL divergence, so you want to set the correct generation parameters to prevent this issue.
The tokenizer
has been pre-imported.
Diese Übung ist Teil des Kurses
Reinforcement Learning from Human Feedback (RLHF)
Anleitung zur Übung
- Set
top_k
andmin_length
to values that help avoid KL divergence.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
generation_kwargs = {
# Set min length and top k parameters
____,
"top_p": 1.0,
"do_sample": True,
"pad_token_id": tokenizer.eos_token_id,
"max_new_tokens": 32}