Mapping tokenization
You now want to test out having more control over the tokenization and want to try tokenizing the data in rows or batches. This will also give you a result that is a DataSet object, which you'll need for training.
The tokenizer has been loaded for you along with the data as train_data and test_data.
Bu egzersiz
Introduction to LLMs in Python
kursunun bir parçasıdırUygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Complete the function
def tokenize_function(data):
return tokenizer(data["interaction"],
____,
padding=True,
____,
max_length=64)
tokenized_in_batches = ____