Compute the optimizer size

You're exploring different optimizers for training a model, and you need to quantify an optimizer's memory usage for an objective comparison. As a test, you've loaded a DistilBERT model and AdamW optimizer so that you quantify memory usage. Write the compute_optimizer_size function to compute the size of an optimizer.

The AdamW optimizer has been defined, and training has completed using optimizer.

This exercise is part of the course

Efficient AI Model Training with PyTorch

View Course

Exercise instructions

Compute number of elements and size of each tensor in the for loop.
Compute the total size of the optimizer in megabytes.
Pass in the optimizer state to compute_optimizer_size.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

def compute_optimizer_size(optimizer_state):
    total_size_megabytes, total_num_elements = 0, 0
    for params in optimizer_state:
        for name, tensor in params.items():
            tensor = torch.tensor(tensor)
            # Compute number of elements and size of each tensor
            num_elements, element_size = tensor.____(), tensor.____()
            total_num_elements += num_elements
            # Compute the total size in megabytes
            total_size_megabytes += ____ * ____ / (1024 ** 2)
    return total_size_megabytes, total_num_elements

# Pass in the optimizer state
total_size_megabytes, total_num_elements = compute_optimizer_size(____.____.____())
print(f"Number of optimizer parameters: {total_num_elements:,}\nOptimizer size: {total_size_megabytes:.0f} MB")

Edit and Run Code