8-bit Adam with Accelerator

You would like to customize your training loop with 8-bit Adam to reduce memory requirements of your model. Prepare the loop with 8-bit Adam for training.

Assume that an 8-bit Adam optimizer has been defined as adam_bnb_optim. Other training objects have been defined: model, train_dataloader, lr_scheduler, and accelerator.

This exercise is part of the course

Efficient AI Model Training with PyTorch

View Course

Exercise instructions

Prepare the 8-bit Adam optimizer for distributed training.
Update the model parameters with the optimizer.
Zero the gradients with the optimizer.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Prepare the 8-bit Adam optimizer for distributed training
model, ____, train_dataloader, lr_scheduler = accelerator.prepare(model, ____, train_dataloader, lr_scheduler)

for batch in train_dataloader:
    inputs, targets = batch["input_ids"], batch["labels"]
    outputs = model(inputs, labels=targets)
    loss = outputs.loss
    accelerator.backward(loss)
    # Update the model parameters
    ____.____()
    lr_scheduler.step()
    # Zero the gradients
    ____.____()
    print(f"Loss = {loss}")

Edit and Run Code