Exercise

Preparing the output text

In this exercise, you will prepare the output texts to be used on the translation model. Apart from transforming the text to sequences of indexes, you also need to one-hot encode each index.

The English texts are loaded on the en_sentences variable, the fitted tokenizer on the output_tokenizer variable and the English vocabulary size on en_vocab_size.

Also, a function to perform the first steps of transforming the output language (transformation of texts into sequence of indexes) is already created. The function is loaded on the environment as transform_text_to_sequences() and has two parameters: sentences that expect a list of sentences in English and tokenizer that expects a fitted Tokenizer object from the keras.preprocessing.text module.

numpy is loaded as np.

Instructions

100 XP
  • Pass the en_sentences and output_tokenizer variables to the transform_text_to_sequences() function to initialize the Y variable.
  • Use the to_categorical() function to one-hot encode the sentences. Use the en_vocab_size variable as number of classes.
  • Transform the temporary list to numpy array and reshape to have shape equal to (num_sentences, sentences_len, en_vocab_size).
  • Print the raw text and the transformed one.