Adafactor with Trainer
You're training a Transformer model with billions of parameters for your language translation service. It is straining your computational resources, so you decide to try the Adafactor
optimizer to reduce memory requirements compared to AdamW
. Prepare the Trainer for Adafactor
!
Some training objects have been pre-loaded, including model
, train_dataset
, validation_dataset
, and compute_metrics
.
This exercise is part of the course
Efficient AI Model Training with PyTorch
Exercise instructions
- Specify
Adafactor
as an optimizer inTrainingArguments
. - Pass in the optimizer state to print the size.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Specify Adafactor as an optimizer
training_args = TrainingArguments(output_dir="./results",
evaluation_strategy="epoch",
____="____")
trainer = Trainer(model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=validation_dataset,
compute_metrics=compute_metrics)
trainer.train()
# Pass in the optimizer state
total_size_megabytes, total_num_elements = compute_optimizer_size(____.____.____.values())
print(f"\nNumber of optimizer parameters: {total_num_elements:,}\nOptimizer size: {total_size_megabytes:.0f} MB")