Encoder decoder architecture

1. Encoder decoder architecture

Now you will learn about the encoder decoder architecture in more detail.

2. Encoder decoder model

A machine translation model works by, first, consuming the words of the source language sequentially, and then, sequentially predicting the corresponding words in the target language.

3. Encoder

However, under the hood, it is actually two different models; an encoder and a decoder. Let's first look at the encoder. In our example, the encoder takes in one-hot vectors of English words as inputs, and produces a compressed representation of the inputs known as a context vector.

4. Encoder and Decoder

Then, the decoder consumes the context vector as an input and produces probabilistic predictions for each time step. The word for a given time step is selected as the word with the highest probability. Note that, though the inputs to the encoder are ones and zeros, the decoder produces continuous probabilistic outputs. These models are also called sequence to sequence models because they map a sequence, that is, an English sentence to another sequence, that is, a French sentence.

5. Analogy: Encoder decoder architecture

Let's understand this through an analogy. A teacher is explaining what an elephant is and you've never seen one. The teacher explains it as "it's big, has large ears and a trunk". Then you create a mental image of an elephant. This would be the encoding process. Then your friend asks "what does an elephant look like?". Then, you decode that mental image of the elephant by explaining the features of an elephant or perhaps by drawing one. This is the decoding process.

6. Reversing sentences - encoder decoder model

To understand the encoder decoder architecture better, let's implement a simple model that reverses a sentence. First, the encoder receives an one-hot representation of the sentence and converts it to word IDs. Next, the decoder takes in the word IDs, reverses them and converts the reversed IDs back to the one-hot representation, resulting in the reversed sentence.

7. Writing the encoder

In order to implement the encoder you will first implement a function called words2onehot which will convert a given list of words to one-hot vectors. The resulting one-hot vector will have the shape number of words by num_classes. num_classes is three in our example. The encoder function is a simple function, that takes in an array of one-hot vectors as the argument and returns the word IDs corresponding to the one-hot vectors. To obtain the word IDs from the one-hot vectors, you can use the np dot argmax function. np.argmax computes the index of the maximum element along a given axis. Since one-hot vectors are laid out along axis 1, you can use axis equals 1.

8. Writing the encoder

After defining the two functions you can encode a given sentence. To do that, first call the words2onehot function with a list of strings as the first argument and word2index as the second argument to get the one-hot vectors. Next call the encoder function with the one-hot vectors as the argument to obtain the context vector. Finally, print the context vector. The context contains the corresponding word IDs of the words.

9. Writing the decoder

The decoder will take in the word IDs, reverse the IDs and then return the one-hot vectors of the reversed words. Next, you write the decoder as a function which takes in the context vector produced by the encoder and produces one-hot vectors of the reversed words. The decoder function first reverses the word IDs in the context vector. In numpy, an 1D array can be reversed by adding, colon colon minus 1 within square brackets. After reversing the word IDs the one-hot vectors are obtained by calling the to_categorical function. You also need a helper function onehot2words which will convert a set of one-hot vectors to human readable words. To do that, the onehot2words function takes in an array of one-hot vectors and a dictionary, index2word, which maps a word ID to a word. Consequentially, index2word is the reversed dictionary of word2index used in the encoder.

10. Writing the decoder

Finally, you can compute the decoder output by calling the decoder function with the context vector as an argument. And obtain the reversed words by calling the onehot2words function with the correct arguments.

11. Let's practice!

Now let's implement the sentence reversing model.