1. 学习
  2. /
  3. 课程
  4. /
  5. Transformer Models with PyTorch

Connected

练习

Adding cross-attention to the decoder layer

To integrate the encoder and decoder stacks you've defined previously into an encoder-decoder transformer, you need to create a cross-attention mechanism to act as a bridge between the two.

The MultiHeadAttention class you defined previously is still available.

说明

100 XP
  • Define a cross-attention mechanism (using MultiHeadAttention) and a third layer normalization (using nn.LayerNorm) in the __init__ method.
  • Complete the forward pass to add cross-attention to the decoder layer.