Get startedGet started for free

The decoder layer

Like encoder transformers, decoder transformers are also built of multiple layers that make use of multi-head attention and feed-forward sublayers. Have a go at combining these components to build a DecoderLayer class.

The MultiHeadAttention and FeedForwardSubLayer classes are available for you to use, and along with the tgt_mask you created.

This exercise is part of the course

Transformer Models with PyTorch

View Course

Exercise instructions

Complete the forward() method to pass the input embeddings through the layers defined in the __init__ method:

  • Perform the attention calculation using the tgt_mask provided and the input embeddings, x, for the query, key, and value matrices.
  • Apply dropout and the first layer normalization, norm1.
  • Perform the pass through the feed-forward sublayer, ff_sublayer.
  • Apply dropout and the second layer normalization, norm2.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

class DecoderLayer(nn.Module):
    def __init__(self, d_model, num_heads, d_ff, dropout):
        super().__init__()
        self.self_attn = MultiHeadAttention(d_model, num_heads)
        self.ff_sublayer = FeedForwardSubLayer(d_model, d_ff)
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x, tgt_mask):
        # Perform the attention calculation
        attn_output = self.____
        # Apply dropout and the first layer normalization
        x = self.____(x + self.____(attn_output))
        # Pass through the feed-forward sublayer
        ff_output = self.____(x)
        # Apply dropout and the second layer normalization
        x = self.____(x + self.____(ff_output))
        return x
Edit and Run Code