The decoder layer
Like encoder transformers, decoder transformers are also built of multiple layers that make use of multi-head attention and feed-forward sublayers. Have a go at combining these components to build a DecoderLayer
class.
The MultiHeadAttention
and FeedForwardSubLayer
classes are available for you to use, and along with the tgt_mask
you created.
Cet exercice fait partie du cours
Transformer Models with PyTorch
Instructions
Complete the forward()
method to pass the input embeddings through the layers defined in the __init__
method:
- Perform the attention calculation using the
tgt_mask
provided and the input embeddings,x
, for the query, key, and value matrices. - Apply
dropout
and the first layer normalization,norm1
. - Perform the pass through the feed-forward sublayer,
ff_sublayer
. - Apply
dropout
and the second layer normalization,norm2
.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
class DecoderLayer(nn.Module):
def __init__(self, d_model, num_heads, d_ff, dropout):
super().__init__()
self.self_attn = MultiHeadAttention(d_model, num_heads)
self.ff_sublayer = FeedForwardSubLayer(d_model, d_ff)
self.norm1 = nn.LayerNorm(d_model)
self.norm2 = nn.LayerNorm(d_model)
self.dropout = nn.Dropout(dropout)
def forward(self, x, tgt_mask):
# Perform the attention calculation
attn_output = self.____
# Apply dropout and the first layer normalization
x = self.____(x + self.____(attn_output))
# Pass through the feed-forward sublayer
ff_output = self.____(x)
# Apply dropout and the second layer normalization
x = self.____(x + self.____(ff_output))
return x