Exercise

Designing a mask for self-attention

To ensure that the decoder can learn to predict tokens, it's important to mask future tokens when modeling the input sequences. You'll build a mask in the form of a triangular matrix of True and False values, with False values in the upper diagonal to exclude future tokens.

Instructions

100 XP

Create a Boolean matrix, tgt_mark to mask future tokens in the attention mechanism of the decoder body.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise