Session Ready
Exercise

Training the word embedding based model

Here you will learn how to implement the training process for a machine translator model that uses word embeddings. A word is represented as a single number instead of a one-hot encoded vector as you did in previous exercises. You will train the model for multiple epochs while traversing through the full dataset in batches.

For this exercise you are provided with training data (tr_en and tr_fr) in the form of a list of sentences. You will only use a very small sample (1000 sentences) of the actual data as it can otherwise take very long to train. You also have the sents2seqs() function and the model, nmt_emb, that you implemented in the previous exercise.

Instructions
100 XP
  • Get a single batch of French sentences without onehot encoding using the sents2seqs() function.
  • Get all the words except the last from de_xy.
  • Get all the words except the first from de_xy_oh (French words with onehot encoding).
  • Train the model using a single batch of data