Get startedGet started for free

Wrap-up and the final showdown

1. Wrap-up and the final showdown

You've learned a great deal about machine translation and maybe a little bit of French as well. Let's have a look back at what you've learned.

2. What you've done so far

First, in chapter 1 you learned what the encoder decoder architecture looks like and how it applies to machine translation. You then played around with a sequential model known as GRU, or, gated recurrent units. In chapter 2 you looked more closely at the encoder decoder architecture and implemented an actual encoder decoder model in Keras. You also learned how to use Dense and TimeDistributed layers in Keras to implement a prediction layer that outputs translation words.

3. What you've done so far

In chapter 3, you learned various data preprocessing techniques. You then trained an actual machine translation model and used it to generate translations. Finally in chapter 4 you learned about a training method known as "teacher forcing" which gives even better performance. You trained your own model using teacher forcing and then generated translations. At the end, you learned about word embeddings and how they can be incorporated to the machine translation model.

4. Machine transation models

In this course, you implemented three different models for an English to French translation task. Model 1 was the most basic model. The encoder consumed the English words as onehot encoded vectors and produced a context vector. Next the decoder consumed this context vector and produced the correct translation. In model 2, the encoder remained the same. The decoder in this model predicted the next word in the translation, given the previous words. For model 3, we replaced onehot vectors with word vectors. Word vectors are much more powerful than onehot vectors and enables the model to learn semantics of words. For example, word vectors capture that a cat is more similar to a dog than a window.

5. Performance of different models

Here you can see the performance of those three models. You can see that the models trained with teacher forcing give the best results. You should note that the model that uses word vectors gets to a higher accuracy much quicker than the model that is not using word vectors.

6. Latest developments and further reading

Though you used the accuracy to evaluate model performance, there is a better metric known as BLEU which tries to imitate how a human would assess the translation. Another important thing to know is how out-of-vocabulary words are treated in productionized models. For example, Google cannot simply replace unknown words with a special token. To address this problem these models use a word piece model. A word piece model will identify most frequent sub-words in a corpus. For example, if the model has seen the words "low" and "newer", and has learned the sub-words "low" and "er", the model can represent the unseen word "lower". One of the most important developments in the field of NLP is the Transformer. Transformer is based on the encoder-decoder architecture. However it does not use any sequential models like GRUs. Rather, it uses something known as attention which is more light-weight than a GRU.

7. All the best!

I hope this has been a fruitful journey about machine translation and I wish you all the best.