1. Learn
  2. /
  3. Courses
  4. /
  5. Efficient AI Model Training with PyTorch

Connected

Exercise

Preprocess text with AutoTokenizer

You're building a precision agriculture application to enable farmers to ask questions on issues they encounter in the field. You'll leverage a dataset of common questions and answers to issues faced by farmers; the fields in this dataset are

  • question: common agricultural questions
  • answers: answers to the agricultural questions

As a first step in distributed training, you'll begin by preprocessing this text dataset.

Some data has been preloaded:

  • dataset contains a sample dataset of agricultural questions and answers
  • AutoTokenizer has been imported from transformers

Instructions

100 XP
  • Load a pre-trained tokenizer.
  • Tokenize example["question"] using the tokenizer.
  • Apply the encode() function to the dataset.