1. Learn
  2. /
  3. Courses
  4. /
  5. Transformer Models with PyTorch

Connected

Exercise

Adding cross-attention to the decoder layer

To integrate the encoder and decoder stacks you've defined previously into an encoder-decoder transformer, you need to create a cross-attention mechanism to act as a bridge between the two.

The MultiHeadAttention class you defined previously is still available.

Instructions

100 XP
  • Define a cross-attention mechanism (using MultiHeadAttention) and a third layer normalization (using nn.LayerNorm) in the __init__ method.
  • Complete the forward pass to add cross-attention to the decoder layer.