Get startedGet started for free

Train models with Accelerator

1. Train models with Accelerator

Next up, we'll train models with Accelerator!

2. Trainer and Accelerator

We saw that the Trainer provides a basic interface for training models.

3. Custom training loops

A limitation of Trainer is that it can't accommodate custom training loops: for example, some advanced tasks in generative AI require two networks.

4. Trainer and Accelerator

Accelerator provides greater control over the training loop, and it prepares our model for distributed training with device placement and data parallelism, as we've seen.

5. Modifying a basic training loop

Let's examine how to modify a PyTorch training loop for distributed training. First, it zeroes the gradients. We move data to a specified device using to(device). Then we perform a forward pass with the model and use the default cross-entropy loss function, accessed with outputs.loss. Next we compute gradients, and update the model parameters and learning rate. Now let's modify this example with Accelerator.

6. Create an Accelerator object

Accelerator provides an interface for distributed training. By default, it handles device placement, but we can choose to manually move data by changing device_placement from true to false.

7. Define the model and optimizer

Next we'll load a pre-trained model using AutoModelForSequenceClassification. For the optimizer, we'll use Adam to update model parameters. Recall the Adam optimizer is a common go-to optimizer due to its versatility.

8. Define the scheduler

A scheduler tells the optimizer when to update parameters to help the model learn faster. The scheduler linearly increases the learning rate for num_warmup_steps (called the warm-up period) and then linearly decreases it for the remaining steps; the total training steps is num_training_steps. If a scheduler is not defined, then the learning rate is held constant.

9. Prepare the model for efficient training

The .prepare() method automatically places the model, optimizer, dataloader, and scheduler on available devices.

10. Building a training loop with Accelerator

Let's see how the PyTorch loop changes when using Accelerator. As before, the loop begins by zeroing the gradients. Previously, we moved data to the device.

11. Building a training loop with Accelerator

Since Accelerator handles device placement, we'll remove these lines for moving data to devices.

12. Building a training loop with Accelerator

Then the loop performs a forward pass and computes cross-entropy loss and gradients.

13. Building a training loop with Accelerator

We'll replace loss.backward() with accelerator.backward(), passing loss to backward(), to compute and synchronize gradients across devices for distributed training. Finally, the loop calls optimizer.step() and scheduler.step() to update the model parameters and learning rate. We've simplified the initial loop by using Accelerator to handle distributed training.

14. Summary of changes

Let's recap. Before Accelerator, we need to manually move data to devices using the .to(device) method, and we compute gradients with loss.backward(). After Accelerator, we've enabled automatic device placement and data parallelism with the accelerator.prepare() method. Accelerator computes gradients and handles gradient synchronization for distributed setups with accelerator.backward(loss). We can customize the loop for advanced applications like training two networks. The updated code is more user-friendly, hardware-agnostic, scalable, and maintainable.

15. Let's practice!

Time to practice!