1. Learn
  2. /
  3. Courses
  4. /
  5. Transformer Models with PyTorch

Connected

Exercise

Designing a mask for self-attention

To ensure that the decoder can learn to predict tokens, it's important to mask future tokens when modeling the input sequences. You'll build a mask in the form of a triangular matrix of True and False values, with False values in the upper diagonal to exclude future tokens.

Instructions

100 XP
  • Create a Boolean matrix, tgt_mark to mask future tokens in the attention mechanism of the decoder body.