Feed-forward sublayers

Feed-forward sub-layers map attention outputs into abstract nonlinear representations to better capture complex relationships.

In this exercise, you'll create a FeedForwardSubLayer for your encoder-only transformer. This layer will consist of two linear layers with a ReLU activation function between them. It also takes two parameters, d_model and d_ff, which represent the dimensionality of the input embeddings and the dimension between the linear layers, respectively.

d_model and d_ff are already available for you to use.

Define the first and second linear layers and ReLU activation for the feed-forward sublayer class, using d_model and a dimension d_ff between layers.
Pass the input through the layers and activation function in the forward() method.
Instantiate the FeedForwardSubLayer using d_model and d_ff provided (set to 512 and 2048, respectively) and apply it to the input embeddings, x.

The Building Blocks of Transformer Models

Building Transformer Architectures

Exercise

Feed-forward sublayers

Instructions