1. Learn
  2. /
  3. Courses
  4. /
  5. Transformer Models with PyTorch

Connected

Exercise

Feed-forward sublayers

Feed-forward sub-layers map attention outputs into abstract nonlinear representations to better capture complex relationships.

In this exercise, you'll create a FeedForwardSubLayer for your encoder-only transformer. This layer will consist of two linear layers with a ReLU activation function between them. It also takes two parameters, d_model and d_ff, which represent the dimensionality of the input embeddings and the dimension between the linear layers, respectively.

d_model and d_ff are already available for you to use.

Instructions

100 XP
  • Define the first and second linear layers and ReLU activation for the feed-forward sublayer class, using d_model and a dimension d_ff between layers.
  • Pass the input through the layers and activation function in the forward() method.
  • Instantiate the FeedForwardSubLayer using d_model and d_ff provided (set to 512 and 2048, respectively) and apply it to the input embeddings, x.