Set up the 8-bit Adam optimizer

You're finding that training your Transformer model for real-time language translation isn't learning effectively with Adafactor. As an alternative, you decide to try an 8-bit Adam optimizer to reduce memory by approximately 75% compared to Adam.

The bitsandbytes library has been imported as bnb, TrainingArguments has been defined as args, and optimizer_grouped_parameters has been pre-loaded. Note the exercise prints a warning message about libbitsandbytes_cpu.so, but you can ignore this warning to complete the exercise.

This exercise is part of the course

Efficient AI Model Training with PyTorch

View Course

Exercise instructions

Instantiate the 8-bit Adam optimizer from the bitsandbytes library.
Pass in the beta1 and beta2 parameters to the 8-bit Adam optimizer.
Pass in the epilson parameter to the 8-bit Adam optimizer.
Print the input parameters from the 8-bit Adam optimizer.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Instantiate the 8-bit Adam optimizer
adam_bnb_optim = ____.____.____(optimizer_grouped_parameters,
                                # Pass in the beta1 and beta2 parameters
                                betas=(args.____, args.____),
                                # Pass in the epilson parameter
                                eps=args.____,
                                lr=args.learning_rate)

# Print the input parameters
print(f"beta1 = {args.____}")
print(f"beta2 = {args.____}")
print(f"epsilon = {args.____}")
print(f"learning_rate = {args.learning_rate}")

Edit and Run Code