Designing a mask for self-attention
To ensure that the decoder can learn to predict tokens, it's important to mask future tokens when modeling the input sequences. You'll build a mask in the form of a triangular matrix of True
and False
values, with False
values in the upper diagonal to exclude future tokens.
Este exercício faz parte do curso
Transformer Models with PyTorch
Instruções do exercício
- Create a Boolean matrix,
tgt_mark
to mask future tokens in the attention mechanism of the decoder body.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
seq_length= 3
# Create a Boolean matrix to mask future tokens
tgt_mask = (1 - torch.____(
torch.____(1, ____, ____), diagonal=____)
).____()
print(tgt_mask)