Generating translations from the model

1. Generating translations from the model

In this lesson you will learn how to generate translations from the model you trained.

2. Previous model vs new model

In the previous model, you generated the full translation straight after you fed the encoder input, as the decoder only depended on the context vector, to produce the translation. But what can we use as the decoder input in this model?

3. Trained model

During the training of the Teacher-Forced model, you provided the French sentences to the decoder as inputs. What are the decoder inputs when performing a new translation? You cannot provide the translation as an input because that's what you want the model to generate.

4. Decoder of the inference model

You can solve this by building a recursive decoder which generates predictions for a single time step. First, it takes in some onehot encoded input word and some previous state as the initial state. The GRU layer then produces an output and a new state. Then, this GRU output goes through a Dense layer and produces an output word. In the next time step, the output and the new GRU state from the previous step become inputs to the decoder.

5. Full inference model

Here, you can see how the full inference model looks like in comparison to the last inference model you implemented.

6. Value of sos and eos tokens

But, what can you feed in as the first input? This is where the "sos" token shines. Since it is not an actual word in the translation and is used just to mark the beginning of the translation, you can input "sos" as the first word to the decoder. Then you recursively generate French words, one at a time. Next, since you used "eos" to mark the ending of a sentence you can stop generating words once the model outputs "eos". But as a safety measure you should also have a maximum length the model can go for, in case the model doesn't output "eos".

7. Defining the generator encoder

The encoder will be identical to the encoder of the previous model. It has an input layer, en_inputs and a GRU layer which returns the last state, en_state. Then you create a model that takes in en_inputs and outputs en_state.

8. Defining the generator decoder

In the decoder, first, you will define two input layers. The first input layer will take in a batch of single onehot encoded French words. But since the GRU layer expects a time series input, you have to add a dimension of length 1 as the first value to the shape argument. In other words it is a time series input of sequence length one. Then you need another input layer to feed in the previous GRU state, which will be a batch of hidden states. Then you define a GRU layer which takes in de_inputs and de_state_in as the initial state. Finally, you have a dense layer which takes the GRU output and produces the final predictions. Note that you don't have a TimeDistributed layer since the input is a single word.

9. Copying the weights

One of the most important steps is to copy the weights of the trained model to the inference model. All the knowledge gained from training is encoded in the weights of the model. You can get the weights of a layer in Keras using the get_weights() function. You can set the weights of a layer with those weights using the set_weights() function. You need to do this only for the layers that have weights. In our machine translation model, there are three layers with weights; the encoder GRU layer, the decoder GRU layer and the decoder Dense layer. For example, if you have the encoder GRU layer of the trained model called tr_en_gru, you can copy the weights of that layer to the inference model's encoder GRU called en_gru, using this syntax.

10. Generating translations

Let's now generate a translation. First, you convert the English sentence to an onehot encoded sequence using the sents2seqs function. Remember to reverse the order of the words. Then, you can get the context_vector, or de_s_t using the predict function on the encoder. Next, you convert a single "sos" token to an onehot encoded sequence using the function word2onehot. word2onehot is a simple function that accepts a tokenizer, a word and the vocabulary size and outputs an onehot encoded sequence of that word.

11. Generating translations

Now you can start recursively generating French words. First you define a variable fr_sent to hold the full sentence. Then for fr_len steps in a loop you do the following. First you predict a word using the decoder. Remember that, the inputs to the decoder are, a word from the French vocabulary and the previous state of the decoder. In the fist step, the input will be "sos" and the input state will be the context vector from the encoder. This model then outputs a word, as a probability distribution and a new state. The new state will be recursively assigned to de_s_t. This means, at every time step, the previous decoder state will become an input to the model. Then in the next step you get the actual word string using probs2word() function. probs2word() is a function that accepts a probability distribution and a tokenizer and outputs the corresponding French word. After that, you convert that word to an onehot encoded sequence using the word2onehot() function. This is assigned back to de_seq which becomes an input to the model in the next step. And you keep iterating this process until the output word is "eos" or, until the end of the for loop.

12. Time to translate!

Great! Now let's generate some translations.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.