Mapping tokenization
You now want to test out having more control over the tokenization and want to try tokenizing the data in rows or batches. This will also give you a result that is a DataSet object, which you'll need for training.
The tokenizer
has been loaded for you along with the data as train_data
and test_data
.
Este exercício faz parte do curso
Introduction to LLMs in Python
Exercício interativo prático
Experimente este exercício preenchendo este código de exemplo.
# Complete the function
def tokenize_function(data):
return tokenizer(data["interaction"],
____,
padding=True,
____,
max_length=64)
tokenized_in_batches = ____