Compute the optimizer size
You're exploring different optimizers for training a model, and you need to quantify an optimizer's memory usage for an objective comparison. As a test, you've loaded a DistilBERT model and AdamW optimizer so that you quantify memory usage. Write the compute_optimizer_size function to compute the size of an optimizer.
The AdamW optimizer has been defined directly (without Trainer), and training has completed.
This exercise is part of the course
Efficient AI Model Training with PyTorch
Exercise instructions
- Compute number of elements and size of each
tensorin theforloop. - Compute the total size of the
optimizerin megabytes. - Access the optimizer state dictionary using the appropriate method on
optimizer.state.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def compute_optimizer_size(optimizer_state):
total_size_megabytes, total_num_elements = 0, 0
for params in optimizer_state:
for name, tensor in params.items():
tensor = torch.tensor(tensor)
# Compute number of elements and size of each tensor
num_elements, element_size = tensor.____(), tensor.____()
total_num_elements += num_elements
# Compute the total size in megabytes
total_size_megabytes += ____ * ____ / (1024 ** 2)
return total_size_megabytes, total_num_elements
# Pass in the optimizer state
total_size_megabytes, total_num_elements = compute_optimizer_size(optimizer.state.____())
print(f"Number of optimizer parameters: {total_num_elements:,}\nOptimizer size: {total_size_megabytes:.0f} MB")