Fine-tuning with TorchTune

1. Fine-tuning with TorchTune

Welcome back! In this video, we'll understand how to configure a fine-tuning job using TorchTune.

2. The components of TorchTune fine-tuning

Fine-tuning in TorchTune revolves around three components. The model includes the base architecture and pre-trained weights, and can be chosen among the different versions and numbers of parameters available. The dataset provides the data to be used for fine-tuning. Finally, the recipe is the configuration file that ties everything together: model, dataset, and training parameters. It can be customized to fit specific fine-tuning tasks, while its format ensures consistency and reproducibility.

3. The components of TorchTune fine-tuning

We've seen how to make our model selection by listing the available models using 'tune ls'. We've also explored how to prepare datasets to best fit our use cases. Now, let's explore how to set up a custom TorchTune recipe. TorchTune recipes are written as YAML files.

4. The components of a TorchTune recipe

Here, the batch_size has been set to 4 to ensure efficient use of memory during training on the GPU device, which is specified as "cuda", and the model will train for 20 epochs. All outputs, including logs and checkpoints, will be stored in the directory specified in output_dir. The model component points to the architecture and model version: in this case, a 1-billion-parameter model from the Llama 3.2 series. For optimization, we select a memory-efficient optimizer based on the Adam optimizer. The learning rate is set to 2e-5, a common starting value for fine-tuning large pre-trained models. The dataset is set to alpaca_dataset, a pre-configured dataset.

5. Configuring TorchTune recipes

Note that the configurations listed here are only a few of the available parameters. To create and save a custom file in our Python environment, we can use the YAML library to define the configuration as a dictionary, and save it in the YAML format. In the configuration, we define parameters such as batch size, device, epochs, and output directory as key-value pairs. We use a nested dictionary for the model section to define the pre-trained model architecture we want to fine-tune. Additional components, like the dataset or optimizer, can be added similarly. We then use the yaml.dump() method to save the dictionary to a .yaml file, in this example saved to "custom_recipe.yaml".

6. Running custom fine-tuning

With our configurations ready to go, we can start the fine-tuning process. To do so, we run "tune run" with our custom recipe file. This command launches training, using all the settings defined in our file. After launch, we see informative logs that provide real-time updates on the fine-tuning process. The outputs, such as logs and checkpoints, are all saved to a text file. We also see confirmation of successful initialization of the model and tokenizer. Finally, we see progress updates on the metrics. Here, the training is on epoch 1, step 52, with a current loss of `2.37`. The progress bar shows that 52 out of 25,880 steps are completed. This output helps us monitor the training process and adjust parameters if necessary.

7. Let's practice!

Let's practice writing some fine-tuning recipes.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.