1. Transfer learning for text classification
Pre-trained models are trained on specific tasks, but we can use them for other tasks with transfer learning.
2. What is transfer learning?
Transfer learning is a powerful technique that uses pre-existing knowledge from one task to improve performance in a related task, saving time, making use of expertise from different domains, and reducing data requirements.
For example, an English Literature teacher might find it easier to teach History due to the overlapping themes and narratives.
We'll explore its application in text classification.
3. Mechanics of transfer learning
Transfer learning for text classification works in three main steps. We begin with a pre-trained model, which has learned patterns and features from its original task, such as text translation.
4. Mechanics of transfer learning
Now, instead of starting from scratch, we transfer this learned knowledge to a new, but related task like sentiment analysis.
5. Mechanics of transfer learning
The final step is fine-tuning, where we adjust the model specifically for sentiment analysis by retraining the existing model with more details.
6. Mechanics of transfer learning
Finally, this creates a new model that can be used for sentiment analysis.
7. Pre-trained model : BERT
We have already used pretrained models in the previous chapter. Now, with the help of transfer learning we will use a language model, BERT, and tune it to work with sentiment analysis.
BERT, or Bidirectional Encoder Representations from Transformers, is trained for language modeling.
It contains multiple layers of transformers
and is pre-trained on large texts.
8. Hands-on: implementing BERT
We begin with texts labeled one for positive and zero for negative sentiments for training and testing.
We import BertTokenizer and BertForSequenceClassification from the transformers package, the latter being apt for text classification, leveraging PyTorch.
This is where transfer learning is pivotal. We initialize a BERT tokenizer and model using from_pretrained and bert-base-uncased, designed for English texts, with num_labels set to two for binary classification.
Subsequently, our texts are prepared using the tokenizer, which handles padding, truncation, and tensor conversion, capping the sequence length at 32. Labels are assigned to the preprocessed inputs, preparing them for training and evaluation in the BERT model.
9. Fine-tuning BERT
We fine-tune using AdamW optimizer, a variant of Adam, with a learning rate of 0-point-00001 for subtle parameter adjustments. Initially, we use one epoch to verify training, increasing it as needed to reduce loss.
Our model enters training mode, adopting the approach used for other text models.
In each epoch, data is fed using the unpacked input_eval variable. We then calculate the loss with the model’s loss attribute and find gradients with loss-dot-backward.
Weights are adjusted with optimizer-dot-step and gradients are reset with optimizer-dot-zero_grad to avoid past interference.
Observing loss per epoch, starting at zero-dot-706, is crucial; the goal is its reduction through successive epochs.
10. Evaluating on new text
Testing the fine-tuned BERT involves tokenizing new text, ensuring it complies with the model’s constraints using return_underscore_tensors, truncation, padding, and max_length.
Tokenized input is dynamically inputted to the model via -asterisk-asterisk-input_eval.
Interpretation of the model’s output involves applying the softmax function on outputs_eval-dot-logits, translating logits, or model outputs, into probabilities between zero and one, with dim=-1 ensuring application over the tensor's last dimension.
The classification of the text as positive or negative corresponds to the highest probability, allowing the model to determine the sentiment accurately.
Our model accurately identifies the text as positive.
11. Let's practice!
Let's practice!