Set up the 8-bit Adam optimizer
You're finding that training your Transformer model for real-time language translation isn't learning effectively with Adafactor. As an alternative, you decide to try an 8-bit Adam optimizer to reduce memory by approximately 75% compared to Adam.
The bitsandbytes
library has been imported as bnb
, TrainingArguments
has been defined as args
, and optimizer_grouped_parameters
has been pre-loaded. Note the exercise prints a warning message about libbitsandbytes_cpu.so
, but you can ignore this warning to complete the exercise.
This exercise is part of the course
Efficient AI Model Training with PyTorch
Exercise instructions
- Instantiate the 8-bit Adam optimizer from the
bitsandbytes
library. - Pass in the beta1 and beta2 parameters to the 8-bit Adam optimizer.
- Pass in the epilson parameter to the 8-bit Adam optimizer.
- Print the input parameters from the 8-bit Adam optimizer.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Instantiate the 8-bit Adam optimizer
adam_bnb_optim = ____.____.____(optimizer_grouped_parameters,
# Pass in the beta1 and beta2 parameters
betas=(args.____, args.____),
# Pass in the epilson parameter
eps=args.____,
lr=args.learning_rate)
# Print the input parameters
print(f"beta1 = {args.____}")
print(f"beta2 = {args.____}")
print(f"epsilon = {args.____}")
print(f"learning_rate = {args.learning_rate}")