Accelerator로 Local SGD 구현하기

지금까지 그래디언트 누적과 그래디언트 체크포인팅을 적용해 언어 번역 모델의 메모리 사용을 최적화했습니다. 그런데 훈련 속도가 여전히 다소 느려서, 디바이스 간 통신 효율을 높이기 위해 훈련 루프에 local SGD를 추가하기로 했습니다. Local SGD를 포함한 훈련 루프를 완성해 보세요!

model, train_dataloader, accelerator는 미리 정의되어 있으며, LocalSGD는 이미 임포트되어 있습니다.

이 연습은 강의의 일부입니다

PyTorch로 AI 모델 효율적으로 학습시키기

강의 보기

연습 안내

8스텝마다 그래디언트를 동기화하도록 local_sgd_steps를 설정하세요.
Local SGD 컨텍스트 매니저의 스텝을 진행하세요.

실습형 인터랙티브 연습

이 예제를 이 샘플 코드를 완성하여 풀어보세요.

# Set up a context manager to synchronize gradients every eight steps
with LocalSGD(accelerator=accelerator, model=model, local_sgd_steps=____, enabled=True) as local_sgd:
    for batch in train_dataloader:
        with accelerator.accumulate(model):
            inputs, targets = batch["input_ids"], batch["labels"]
            outputs = model(inputs, labels=targets)
            loss = outputs.loss
            accelerator.backward(loss)
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()
            # Step the local SGD context manager
            local_sgd.____()

코드 편집 및 실행