1. Learn
  2. /
  3. Courses
  4. /
  5. Reinforcement Learning from Human Feedback (RLHF)

Connected

Exercise

Tokenize a text dataset

You are working on market research for a travel website, and would like to use an existing dataset to fine tune a model that will help you classify hotel reviews. You decide to use the datasets library.

The AutoTokenizer class has been pre-imported from transformers, and load_dataset() has been pre-imported from datasets.

Instructions

100 XP
  • Add padding to the tokenizer to process text as equal-sized batches.
  • Tokenize the text data using the pre-trained GPT tokenizer and defined function.