Exercise

Adding cross-attention to the decoder layer

To integrate the encoder and decoder stacks you've defined previously into an encoder-decoder transformer, you need to create a cross-attention mechanism to act as a bridge between the two.

The MultiHeadAttention class you defined previously is still available.

Instructions

100 XP

Define a cross-attention mechanism (using MultiHeadAttention) and a third layer normalization (using nn.LayerNorm) in the __init__ method.
Complete the forward pass to add cross-attention to the decoder layer.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise