Designing a mask for self-attention
To ensure that the decoder can learn to predict tokens, it's important to mask future tokens when modeling the input sequences. You'll build a mask in the form of a triangular matrix of True and False values, with False values in the upper diagonal to exclude future tokens.
Deze oefening maakt deel uit van de cursus
Transformer Models with PyTorch
Oefeninstructies
- Create a Boolean matrix,
tgt_markto mask future tokens in the attention mechanism of the decoder body.
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
seq_length= 3
# Create a Boolean matrix to mask future tokens
tgt_mask = (1 - torch.____(
torch.____(1, ____, ____), diagonal=____)
).____()
print(tgt_mask)