Get startedGet started for free

GPT models and training LLMs

1. GPT models and training LLMs

Welcome back to our exploration of ChatGPT!

2. The genesis of GPT

OpenAI launched the GPT model series in 2018. The first iteration, GPT-1, was a first-of-its-kind model boasting 117 million parameters. With GPT-4 claiming to boast 1.76 trillion parameters, we have catapulted our ability to not only generate human-like text but also images and audio.

3. The genesis of GPT

Fun fact: These parameters of a model, also known as weights, can be thought of as synapses in a brain for a biological neural network. Generally, the more parameters, the better the model’s performance.

4. Breaking down GPT

Now, the generative pre-trained transformer can be a mouthful, so let's break it down simply into its respective components.

5. Generative

Generative refers to the model's ability to generate new text based on patterns it has learned from its training data. A generative language model like GPT can produce coherent text in response to a prompt rather than selecting a predefined response.

6. Pre-trained

Pre-trained means that the model has already been trained on a large amount of text data before fine-tuning for specific tasks. This allows it to learn faster and have better results than starting from scratch.

7. Transformer

The transformer is the architecture used in GPT models. It is a type of neural network that has become the gold-standard architecture. Unlike RNNs, a transformer can effectively process long text sequences without losing information.

8. ChatGPT

What happens when we piece this all together? Well, we get ChatGPT, created by fine-tuning GPT-3 to follow instructions using human feedback, which better aligns the model's output with user intent.

9. How do we train LLMs?

Training state-of-the-art models like ChatGPT today can cost anywhere from $10M to $100M+. We can break this training process down into three phases: pre-training, fine-tuning, and reinforcement learning from human feedback, or RLHF.

10. Step 1 - Pre-training

The first phase involves pre-training, which can be thought of as compressing the Internet. It takes around 12 days to obtain a base model that understands the general contours of human language. The training dataset differences are vast. GPT-2 was trained on 40GB of web text, whereas GPT-3 was trained on 570GB of web text! That’s a lot of information.

11. Step 2 - Fine-tuning

The next phase involves fine-tuning. We don't just want a document generator; we want a useful assistant that we can ask questions and get answers from. This aligns the base model as an assistant model, enabling it to adapt to specific tasks or styles. While a base model is trained on internet documents, our assistant model is trained on ideal responses from humans. After collecting over 100,000 ideal human responses, the base model is fine-tuned on this data.

12. Step 3 - RLHF

The third and final phase involves reinforcement learning from human feedback, allowing us to arrive at a more performant assistant model. RLHF operates on a simple yet powerful premise: improvement through comparison. Think of it as presenting two pieces of advice to an expert and asking, "Which is better?" By repeating this process across millions of instances, we understand what makes responses more helpful, accurate, or enjoyable.

13. The role of AI in crafting comparison labels

Initially heavily reliant on human judgments, the process of RLHF now integrates AI to assist in generating and even critiquing comparison labels. LLMs can create, review and critique labels based on human instructions.

14. The role of AI in crafting comparison labels

This collaborative approach not only scales the process but also enriches the model's learning. We can determine this slider, and it's moving closer to the right over time.

15. Let's practice!

Now let's apply our understanding of training LLMs in a practical exercise.