Feed-forward sublayers
Feed-forward sub-layers map attention outputs into abstract nonlinear representations to better capture complex relationships.
In this exercise, you'll create a FeedForwardSubLayer for your encoder-only transformer. This layer will consist of two linear layers with a ReLU activation function between them. It also takes two parameters, d_model and d_ff, which represent the dimensionality of the input embeddings and the dimension between the linear layers, respectively.
d_model and d_ff are already available for you to use.
This exercise is part of the course
Transformer Models with PyTorch
Exercise instructions
- Define the first and second linear layers and ReLU activation for the feed-forward sublayer class, using
d_modeland a dimensiond_ffbetween layers. - Pass the input through the layers and activation function in the
forward()method. - Instantiate the
FeedForwardSubLayerusingd_modelandd_ffprovided (set to512and2048, respectively) and apply it to the input embeddings,x.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
class FeedForwardSubLayer(nn.Module):
def __init__(self, d_model, d_ff):
super().__init__()
# Define the layers and activation
self.fc1 = ____
self.fc2 = ____
self.relu = ____
def forward(self, x):
# Pass the input through the layers and activation
return self.____(self.____(self.____(x)))
# Instantiate the FeedForwardSubLayer and apply it to x
feed_forward = ____
output = ____
print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")