Set up the 8-bit Adam optimizer

You're finding that training your Transformer model for real-time language translation isn't learning effectively with Adafactor. As an alternative, you decide to try an 8-bit Adam optimizer to reduce memory by approximately 75% compared to Adam.

The bitsandbytes library has been imported as bnb, TrainingArguments has been defined as args, and optimizer_grouped_parameters has been pre-loaded. Note the exercise prints a warning message about libbitsandbytes_cpu.so, but you can ignore this warning to complete the exercise.

Instantiate the 8-bit Adam optimizer from the bitsandbytes library.
Pass in the beta1 and beta2 parameters to the 8-bit Adam optimizer.
Pass in the epilson parameter to the 8-bit Adam optimizer.
Print the input parameters from the 8-bit Adam optimizer.

Exercise

Set up the 8-bit Adam optimizer

Instructions

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise