Get startedGet started for free

Preprocessing data

You now need to process the data for our new model which has two inputs and a single output. The two inputs are, the one-hot encoded English words and one-hot encoded French words excluding the last word.

The output would be the one-hot encoded French words excluding the first word. In other words, in the decoder, each input French word has an output, which is the next word. Here you will learn how to implement that.

You have been provided with the sents2seqs() function, en_text and fr_text.

This exercise is part of the course

Machine Translation with Keras

View Course

Exercise instructions

  • Obtain a batch of encoder inputs (from i to i+bsize) using the sents2seqs() function (onehot encoded and reversed).
  • Obtain a batch of decoder inputs and outputs (from i to i+bsize) using the sents2seqs() function (onehot encoded).
  • Separate the decoder inputs (all French words except the last) from de_xy by slicing on the time dimension.
  • Separate the decoder outputs (all French words except the first) from de_xy.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

bsize = 250
for i in range(0, len(en_text), bsize):
  # Get the encoder inputs using the sents2seqs() function
  en_x = ____('source', ____[____:____], onehot=True, reverse=____)
  # Get the decoder inputs/outputs using the sents2seqs() function
  de_xy = sents2seqs('target', ____[____:____], onehot=True)
  # Separate the decoder inputs from de_xy
  de_x = de_xy[:,____,:]
  # Separate the decoder outputs from de_xy
  de_y = de_xy[:,____,:]
  
  print("Data from ", i, " to ", i+bsize)
  print("\tnp.argmax() => en_x[0]: ", np.argmax(en_x[0], axis=-1))
  print("\tnp.argmax() => de_x[0]: ", np.argmax(de_x[0], axis=-1))
  print("\tnp.argmax() => de_y[0]: ", np.argmax(de_y[0], axis=-1))
Edit and Run Code