Creating a transformer model
At PyBooks, the recommendation engine you're working on needs more refined capabilities to understand the sentiments of user reviews. You believe that using transformers, a state-of-the-art architecture, can help achieve this. You decide to build a transformer model that can encode the sentiments in the reviews to kickstart the project.
The following packages have been imported for you: torch
, nn
, optim
.
The input data contains sentences such as : "I love this product", "This is terrible", "Could be better" …
and their respective binary sentiment labels such as : 1, 0, 0, ...
The input data is split and converted to embeddings in the following variables:
train_sentences
, train_labels
,test_sentences
,test_labels
,token_embeddings
This exercise is part of the course
Deep Learning for Text with PyTorch
Exercise instructions
- Initialize the transformer encoder.
- Define the fully connected layer based on the number of sentiment classes.
- In the forward method, pass the input through the transformer encoder followed by the linear layer.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
class TransformerEncoder(nn.Module):
def __init__(self, embed_size, heads, num_layers, dropout):
super(TransformerEncoder, self).__init__()
# Initialize the encoder
self.encoder = nn.____(
nn.____(d_model=embed_size, nhead=heads),
num_layers=num_layers)
# Define the fully connected layer
self.fc = nn.Linear(embed_size, ____)
def forward(self, x):
# Pass the input through the transformer encoder
x = self.____(x)
x = x.mean(dim=1)
return self.fc(x)
model = TransformerEncoder(embed_size=512, heads=8, num_layers=3, dropout=0.5)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()