Text prediction with LSTMs
During the following exercises you will build a toy LSTM model that is able to predict the next word using a small text dataset.
This dataset consist of cleaned quotes from the The Lord of the Ring movies. You can find them in the text
variable.
You will turn this text
into sequences
of length 4 and make use of the Keras Tokenizer
to prepare the features and labels for your model!
The Keras Tokenizer
is already imported for you to use. It assigns a unique number to each unique word, and stores the mappings in a dictionary. This is important since the model deals with numbers but we later will want to decode the output numbers back into words.
Cet exercice fait partie du cours
Introduction to Deep Learning with Keras
Instructions
- Split the text into an array of words using
.split()
. - Make sentences of 4 words each, moving one word at a time.
- Instantiate a
Tokenizer()
, then fit it on the sentences with.fit_on_texts()
. - Turn
sentences
into a sequence of numbers calling.texts_to_sequences()
.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Split text into an array of words
words = ____.____
# Make sentences of 4 words each, moving one word at a time
sentences = []
for i in range(4, len(words)):
sentences.append(' '.join(words[i-____:i]))
# Instantiate a Tokenizer, then fit it on the sentences
tokenizer = ____
tokenizer.____(____)
# Turn sentences into a sequence of numbers
sequences = tokenizer.____(____)
print("Sentences: \n {} \n Sequences: \n {}".format(sentences[:5],sequences[:5]))