The decoder layer

Like encoder transformers, decoder transformers are also built of multiple layers that make use of multi-head attention and feed-forward sublayers. Have a go at combining these components to build a DecoderLayer class.

The MultiHeadAttention and FeedForwardSubLayer classes are available for you to use, and along with the tgt_mask you created.

Complete the forward() method to pass the input embeddings through the layers defined in the __init__ method:

Perform the attention calculation using the tgt_mask provided and the input embeddings, x, for the query, key, and value matrices.
Apply dropout and the first layer normalization, norm1.
Perform the pass through the feed-forward sublayer, ff_sublayer.
Apply dropout and the second layer normalization, norm2.

Exercise

The decoder layer

Instructions

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise