1. Learn
  2. /
  3. Courses
  4. /
  5. Transformer Models with PyTorch

Connected

Exercise

The decoder layer

Like encoder transformers, decoder transformers are also built of multiple layers that make use of multi-head attention and feed-forward sublayers. Have a go at combining these components to build a DecoderLayer class.

The MultiHeadAttention and FeedForwardSubLayer classes are available for you to use, and along with the tgt_mask you created.

Instructions

100 XP

Complete the forward() method to pass the input embeddings through the layers defined in the __init__ method:

  • Perform the attention calculation using the tgt_mask provided and the input embeddings, x, for the query, key, and value matrices.
  • Apply dropout and the first layer normalization, norm1.
  • Perform the pass through the feed-forward sublayer, ff_sublayer.
  • Apply dropout and the second layer normalization, norm2.