Designing a mask for self-attention
To ensure that the decoder can learn to predict tokens, it's important to mask future tokens when modeling the input sequences. You'll build a mask in the form of a triangular matrix of True and False values, with False values in the upper diagonal to exclude future tokens.
Latihan ini adalah bagian dari kursus
Transformer Models with PyTorch
Petunjuk latihan
- Create a Boolean matrix,
tgt_markto mask future tokens in the attention mechanism of the decoder body.
Latihan interaktif praktis
Cobalah latihan ini dengan menyelesaikan kode contoh berikut.
seq_length= 3
# Create a Boolean matrix to mask future tokens
tgt_mask = (1 - torch.____(
torch.____(1, ____, ____), diagonal=____)
).____()
print(tgt_mask)