Adafactor with Accelerator
You've demonstrated a proof-of-concept of Adafactor
with Trainer
to train your language translation model with reduced memory requirements. Now you'd like to customize your training loop using Accelerator
. Build the training loop to use Adafactor
!
The compute_optimizer_size()
function has been pre-defined. Some training objects have been pre-loaded: model
, train_dataloader
, and accelerator
.
This exercise is part of the course
Efficient AI Model Training with PyTorch
Exercise instructions
- Pass the model parameters to
Adafactor
when defining theoptimizer
. - Pass in the optimizer state to print the size.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Pass the model parameters to Adafactor
optimizer = ____(params=____.____())
model, optimizer, train_dataloader = accelerator.prepare(model, optimizer, train_dataloader)
for batch in train_dataloader:
inputs, targets = batch["input_ids"], batch["labels"]
outputs = model(inputs, labels=targets)
loss = outputs.loss
accelerator.backward(loss)
optimizer.step()
optimizer.zero_grad()
# Pass in the optimizer state
total_size_megabytes, total_num_elements = compute_optimizer_size(____.____.values())
print(f"Number of optimizer parameters: {total_num_elements:,}\nOptimizer size: {total_size_megabytes:.0f} MB")