Preprocessing data
You now need to process the data for our new model which has two inputs and a single output. The two inputs are, the one-hot encoded English words and one-hot encoded French words excluding the last word.
The output would be the one-hot encoded French words excluding the first word. In other words, in the decoder, each input French word has an output, which is the next word. Here you will learn how to implement that.
You have been provided with the sents2seqs()
function, en_text
and fr_text
.
This exercise is part of the course
Machine Translation with Keras
Exercise instructions
- Obtain a batch of encoder inputs (from
i
toi+bsize
) using thesents2seqs()
function (onehot encoded and reversed). - Obtain a batch of decoder inputs and outputs (from
i
toi+bsize
) using thesents2seqs()
function (onehot encoded). - Separate the decoder inputs (all French words except the last) from
de_xy
by slicing on the time dimension. - Separate the decoder outputs (all French words except the first) from
de_xy
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
bsize = 250
for i in range(0, len(en_text), bsize):
# Get the encoder inputs using the sents2seqs() function
en_x = ____('source', ____[____:____], onehot=True, reverse=____)
# Get the decoder inputs/outputs using the sents2seqs() function
de_xy = sents2seqs('target', ____[____:____], onehot=True)
# Separate the decoder inputs from de_xy
de_x = de_xy[:,____,:]
# Separate the decoder outputs from de_xy
de_y = de_xy[:,____,:]
print("Data from ", i, " to ", i+bsize)
print("\tnp.argmax() => en_x[0]: ", np.argmax(en_x[0], axis=-1))
print("\tnp.argmax() => de_x[0]: ", np.argmax(de_x[0], axis=-1))
print("\tnp.argmax() => de_y[0]: ", np.argmax(de_y[0], axis=-1))