Creating positional encodings
Embedding the tokens is a good start, but these embeddings still lack information about each token's position in the sequence. To remedy this, the transformer architecture makes use of positional encodings. This encodes positional information from each token into the embeddings.
You'll create a PositionalEncoding class with the following parameters:
d_model: the dimensionality of the input embeddingsmax_seq_length: the maximum sequence length (or the sequence length if each sequence is the same length)
Este ejercicio forma parte del curso
Transformer Models with PyTorch
Instrucciones del ejercicio
- Create a matrix of zeros of dimensions
max_seq_lengthbyd_model. - Perform the sine and cosine calculations on
position * div_termto create the even and odd positional embedding values. - Ensure
peisn't a learnable parameter during training. - Add the transformed positional embeddings to the input token embeddings,
x.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
class PositionalEncoding(nn.Module):
def __init__(self, d_model, max_seq_length):
super().__init__()
# Create a matrix of zeros of dimensions max_seq_length by d_model
pe = ____
position = torch.arange(0, max_seq_length, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model))
# Perform the sine and cosine calculations
pe[:, 0::2] = torch.____(position * div_term)
pe[:, 1::2] = torch.____(position * div_term)
# Ensure pe isn't a learnable parameter during training
self.____('____', pe.unsqueeze(0))
def forward(self, x):
# Add the positional embeddings to the token embeddings
return ____ + ____[:, :x.size(1)]
pos_encoding_layer = PositionalEncoding(d_model=512, max_seq_length=4)
output = pos_encoding_layer(token_embeddings)
print(output.shape)
print(output[0][0][:10])