1. Fine-tuning through training
With the data prepared, the next step is to set up our training arguments and loop.
2. Training Arguments
We begin with the TrainingArguments() class from the transformers library. This class allows customization to balance efficiency, and performance. We'll cover key parameters, but reviewing the full documentation is recommended, as values depend on use case, dataset size, and speed requirements.
Output_dir is an output directory where we want to save our model predictions.
Evaluation_strategy determines when to evaluate during training. We've set it to epoch, meaning we evaluate at the end of each epoch, or after one complete pass of the full training dataset (useful if the dataset is small). This could also be set as steps, where evaluation is logged at every value of eval_steps, if defined, or there could be no evaluation during training.
Num_train_epochs is the number of training epochs.
Learning_rate is the learning rate we want for the optimizer.
3. Training Arguments
Per_device_train_batch_size and per_device_eval_batch_size set the batch size we want to use during training and evaluation.
And finally, we set a weight_decay as we'll be fine tuning with a small dataset and this helps us avoid overfitting.
4. Trainer class
With the training arguments set up, we then pass this object to a Trainer class instance.
Here, we define the model we want to fine-tune. We set our training arguments to the args parameter. We then want to define our training and testing data under train_dataset and eval_dataset, and specify the tokenizer to ensure our preprocessing is always consistent.
We instantiate a training loop with the .train() method. The dataset size, num_train_epochs, and the batch sizes per device determine the number of loops.
5. Trainer output
The output of trainer.train() tells us how the training process is going. We expect the loss to decrease, which it does here. This example used a subset of the imdb data for training.
The model is now fine-tuned.
6. Using the fine-tuned model
We can use the fine-tuned model to make predictions on new data. First we prepare the new input data using the same tokenizer. Before passing these inputs through the fine-tuned model, we wrap this process in torch.no_grad to disable gradient tracking since these aren't needed for evaluation. We extract the predicted class with .argmax() and print them using a for loop for better readability.
7. Fine-tuning results
Our results are successful!
8. Saving models and tokenizers
Finally, we save the fine-tuned model and tokenizer using .save_pretrained() and specify the path. While fine-tuning doesn't change the tokenizer, saving it with the updated model ensures reproducibility.
To reload, we use the same Auto classes and the path to the saved model and tokenizer.
9. Let's practice!
Let's practice!