Designing a mask for self-attention
To ensure that the decoder can learn to predict tokens, it's important to mask future tokens when modeling the input sequences. You'll build a mask in the form of a triangular matrix of True and False values, with False values in the upper diagonal to exclude future tokens.
Bu egzersiz
Transformer Models with PyTorch
kursunun bir parçasıdırEgzersiz talimatları
- Create a Boolean matrix,
tgt_markto mask future tokens in the attention mechanism of the decoder body.
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
seq_length= 3
# Create a Boolean matrix to mask future tokens
tgt_mask = (1 - torch.____(
torch.____(1, ____, ____), diagonal=____)
).____()
print(tgt_mask)