Get startedGet started for free

Feed-forward sublayers

Feed-forward sub-layers map attention outputs into abstract nonlinear representations to better capture complex relationships.

In this exercise, you'll create a FeedForwardSubLayer for your encoder-only transformer. This layer will consist of two linear layers with a ReLU activation function between them. It also takes two parameters, d_model and d_ff, which represent the dimensionality of the input embeddings and the dimension between the linear layers, respectively.

d_model and d_ff are already available for you to use.

This exercise is part of the course

Transformer Models with PyTorch

View Course

Exercise instructions

  • Define the first and second linear layers and ReLU activation for the feed-forward sublayer class, using d_model and a dimension d_ff between layers.
  • Pass the input through the layers and activation function in the forward() method.
  • Instantiate the FeedForwardSubLayer using d_model and d_ff provided (set to 512 and 2048, respectively) and apply it to the input embeddings, x.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

class FeedForwardSubLayer(nn.Module):
    def __init__(self, d_model, d_ff):
        super().__init__()
        # Define the layers and activation
        self.fc1 = ____
        self.fc2 = ____
        self.relu = ____

    def forward(self, x):
        # Pass the input through the layers and activation
        return self.____(self.____(self.____(x)))
    
# Instantiate the FeedForwardSubLayer and apply it to x
feed_forward = ____
output = ____
print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")
Edit and Run Code