8-bit Adam with Accelerator
You would like to customize your training loop with 8-bit Adam to reduce memory requirements of your model. Prepare the loop with 8-bit Adam for training.
Assume that an 8-bit Adam optimizer has been defined as adam_bnb_optim
. Other training objects have been defined: model
, train_dataloader
, lr_scheduler
, and accelerator
.
This exercise is part of the course
Efficient AI Model Training with PyTorch
Exercise instructions
- Prepare the 8-bit Adam optimizer for distributed training.
- Update the model parameters with the optimizer.
- Zero the gradients with the optimizer.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Prepare the 8-bit Adam optimizer for distributed training
model, ____, train_dataloader, lr_scheduler = accelerator.prepare(model, ____, train_dataloader, lr_scheduler)
for batch in train_dataloader:
inputs, targets = batch["input_ids"], batch["labels"]
outputs = model(inputs, labels=targets)
loss = outputs.loss
accelerator.backward(loss)
# Update the model parameters
____.____()
lr_scheduler.step()
# Zero the gradients
____.____()
print(f"Loss = {loss}")