BaşlayınÜcretsiz Başlayın

Initializing the reward

You are in the final stages of deploying a generative model designed to offer personalized recommendations for an online bookstore. To align this model with human-preferred recommendations, you need to train a reward model using some collected preference data. The first step is to initialize the model and configuration parameters.

The AutoTokenizer and AutoModelForSequenceClassification were preloaded from transformers. RewardConfig was preloaded from trl.

Bu egzersiz

Reinforcement Learning from Human Feedback (RLHF)

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Load the GPT-1 model, "openai-gpt", for the sequence classification task using Hugging Face's AutoModelForSequenceClassification.
  • Initialize the reward configuration using "output_dir" as the output directory, and set the token maximum length to 60.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Load the pre-trained GPT-1 model for text classification
model = ____

tokenizer = AutoTokenizer.from_pretrained("openai-gpt")

# Initialize the reward configuration and set max_length
config = ____(output_dir=____, max_length=____)
Kodu Düzenle ve Çalıştır