Feed-forward sublayers
Feed-forward sub-layers map attention outputs into abstract nonlinear representations to better capture complex relationships.
In this exercise, you'll create a FeedForwardSubLayer
for your encoder-only transformer. This layer will consist of two linear layers with a ReLU activation function between them. It also takes two parameters, d_model
and d_ff
, which represent the dimensionality of the input embeddings and the dimension between the linear layers, respectively.
d_model
and d_ff
are already available for you to use.
This exercise is part of the course
Transformer Models with PyTorch
Exercise instructions
- Define the first and second linear layers and ReLU activation for the feed-forward sublayer class, using
d_model
and a dimensiond_ff
between layers. - Pass the input through the layers and activation function in the
forward()
method. - Instantiate the
FeedForwardSubLayer
usingd_model
andd_ff
provided (set to512
and2048
, respectively) and apply it to the input embeddings,x
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
class FeedForwardSubLayer(nn.Module):
def __init__(self, d_model, d_ff):
super().__init__()
# Define the layers and activation
self.fc1 = ____
self.fc2 = ____
self.relu = ____
def forward(self, x):
# Pass the input through the layers and activation
return self.____(self.____(self.____(x)))
# Instantiate the FeedForwardSubLayer and apply it to x
feed_forward = ____
output = ____
print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")