Implementing multi-head attention
Before you dive in and begin building your own MultiHeadAttention
class, you'll try out using the class to see how it transforms the query, key, and value matrices. Recall that these matrices are generated by projecting the input embeddings using linear transformations with learned weights.
query
, key
, and value
matrices have already been created for you, and the MultiHeadAttention
has been defined for you.
Diese Übung ist Teil des Kurses
Transformer Models with PyTorch
Anleitung zur Übung
- Define the attention parameters for eight attention heads and input embeddings with a dimensionality of
512
. - Create an instance of the
MultiHeadAttention
class using the defined parameters. - Pass the
query
,key
, andvalue
matrices through themultihead_attn
mechanism.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Define attention parameters
d_model = ____
num_heads = ____
# Instantiate a MultiHeadAttention instance
multihead_attn = ____
# Pass the query, key, and value matrices through the mechanism
output = ____
print(output.shape)