The decoder layer
Like encoder transformers, decoder transformers are also built of multiple layers that make use of multi-head attention and feed-forward sublayers. Have a go at combining these components to build a DecoderLayer class.
The MultiHeadAttention and FeedForwardSubLayer classes are available for you to use, and along with the tgt_mask you created.
Este exercício faz parte do curso
Transformer Models with PyTorch
Instruções do exercício
Complete the forward() method to pass the input embeddings through the layers defined in the __init__ method:
- Perform the attention calculation using the
tgt_maskprovided and the input embeddings,x, for the query, key, and value matrices. - Apply
dropoutand the first layer normalization,norm1. - Perform the pass through the feed-forward sublayer,
ff_sublayer. - Apply
dropoutand the second layer normalization,norm2.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
class DecoderLayer(nn.Module):
def __init__(self, d_model, num_heads, d_ff, dropout):
super().__init__()
self.self_attn = MultiHeadAttention(d_model, num_heads)
self.ff_sublayer = FeedForwardSubLayer(d_model, d_ff)
self.norm1 = nn.LayerNorm(d_model)
self.norm2 = nn.LayerNorm(d_model)
self.dropout = nn.Dropout(dropout)
def forward(self, x, tgt_mask):
# Perform the attention calculation
attn_output = self.____
# Apply dropout and the first layer normalization
x = self.____(x + self.____(attn_output))
# Pass through the feed-forward sublayer
ff_output = self.____(x)
# Apply dropout and the second layer normalization
x = self.____(x + self.____(ff_output))
return x