1. Active learning
Hi! We'll now explore the human-in-the-loop architecture and how it plays a crucial role in AI systems, especially in reinforcement learning with human feedback. We'll also learn to construct an active learning pipeline to collect data efficiently. Let's get started!
2. Human in the loop systems
In systems where human reviewers are overseeing the model's output,
3. Human in the loop systems
labeling large amounts of data can quickly become expensive. So, we need to carefully consider a strategy for data sampling.
4. Human in the loop systems
A naive approach would be to randomly sample from our unlabeled data, but what if the model is only struggling with two specific classes? Sampling from any other class would be a waste of resources.
5. Human in the loop systems
This is where active learning becomes valuable.
Active learning optimizes the learning process by selecting the samples that will enable our model to learn more efficiently, thereby reducing the labeling effort required. This is done through the presence of the human evaluator, or reviewer, in the system.
6. Active learning in RLHF
In the RLHF process, active learning is used to select data points with high uncertainty and focus the input from human evaluators on those data points only.
7. Active learning in RLHF
This is used to fine-tune and update the reward model to better reflect human preferences.
8. Active learning
Let's take a look at how active learning works. First, we create a test dataset to evaluate model performance at each step of the active learning loop. This involves randomly sampling from the unlabeled data pool to obtain a representative sample.
9. Active learning
Next, we initialize our model using any existing labels or by taking another random sample if no labels are available.
10. Active learning
The active learning loop consists of the following steps:
First, we train the model on the available labeled data.
Then, we score the remaining unlabeled data using the model. If the model is confident enough, the labels are generated by the model itself.
11. Active learning
Otherwise, a human in the loop is involved,
12. Active learning
and the labels are finally generated.
13. Active learning pipeline with low confidence
Let's build an active learning pipeline that uses low-confidence sampling. This strategy selects data points where the model is least confident, allowing it to focus on the most challenging samples.
First, we initialize the active learner from the modAL library.
We set logistic regression as the estimator and uncertainty sampling as the query strategy.
Uncertainty sampling is one of the most common strategies in active learning. It selects data points where the model has the least confidence in its predictions.
The model starts training with the provided labeled data, X_labeled and y_labeled.
14. Active learning pipeline with low confidence
Next, we run a loop where the model selects the least confident data from X_unlabeled, adds it to the labeled dataset, and updates the labels with y_labeled. This process repeats for the number of iterations needed based on the size or complexity of the dataset, which here is 10 for demonstration.
At each iteration, we remove labeled samples from the unlabeled pool to prevent duplicates.
As a result, the model continually improves by focusing on the most uncertain data, gradually requiring less human input while increasing its accuracy.
15. Let's practice!
We have learned how an active learning strategy can help us deliver a strong reward model using less data. Now, let's practice building the pipeline.