Designing a mask for self-attention
To ensure that the decoder can learn to predict tokens, it's important to mask future tokens when modeling the input sequences. You'll build a mask in the form of a triangular matrix of True
and False
values, with False
values in the upper diagonal to exclude future tokens.
Cet exercice fait partie du cours
Transformer Models with PyTorch
Instructions
- Create a Boolean matrix,
tgt_mark
to mask future tokens in the attention mechanism of the decoder body.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
seq_length= 3
# Create a Boolean matrix to mask future tokens
tgt_mask = (1 - torch.____(
torch.____(1, ____, ____), diagonal=____)
).____()
print(tgt_mask)