Preparing the output text
In this exercise, you will prepare the output texts to be used on the translation model. Apart from transforming the text to sequences of indexes, you also need to one-hot encode each index.
The English texts are loaded on the en_sentences
variable, the fitted tokenizer on the output_tokenizer
variable and the English vocabulary size on en_vocab_size
.
Also, a function to perform the first steps of transforming the output language (transformation of texts into sequence of indexes) is already created. The function is loaded on the environment as transform_text_to_sequences()
and has two parameters: sentences
that expect a list of sentences in English and tokenizer
that expects a fitted Tokenizer
object from the keras.preprocessing.text
module.
numpy
is loaded as np
.
This exercise is part of the course
Recurrent Neural Networks (RNNs) for Language Modeling with Keras
Exercise instructions
- Pass the
en_sentences
andoutput_tokenizer
variables to thetransform_text_to_sequences()
function to initialize theY
variable. - Use the
to_categorical()
function to one-hot encode the sentences. Use theen_vocab_size
variable as number of classes. - Transform the temporary list to numpy array and reshape to have shape equal to
(num_sentences, sentences_len, en_vocab_size)
. - Print the raw text and the transformed one.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize the variable
Y = transform_text_to_sequences(____, ____)
# Temporary list
ylist = list()
for sequence in Y:
# One-hot encode sentence and append to list
ylist.append(____(sequence, num_classes=____))
# Update the variable
Y = np.array(ylist).reshape(____, Y.shape[1], en_vocab_size)
# Print the raw sentence and its transformed version
print("Raw sentence: {0}\nTransformed: {1}".format(____, Y[0]))