Text prediction with LSTMs
During the following exercises you will build a toy LSTM model that is able to predict the next word using a small text dataset.
This dataset consist of cleaned quotes from the The Lord of the Ring movies. You can find them in the text
variable.
You will turn this text
into sequences
of length 4 and make use of the Keras Tokenizer
to prepare the features and labels for your model!
The Keras Tokenizer
is already imported for you to use. It assigns a unique number to each unique word, and stores the mappings in a dictionary. This is important since the model deals with numbers but we later will want to decode the output numbers back into words.
This exercise is part of the course
Introduction to Deep Learning with Keras
Exercise instructions
- Split the text into an array of words using
.split()
. - Make sentences of 4 words each, moving one word at a time.
- Instantiate a
Tokenizer()
, then fit it on the sentences with.fit_on_texts()
. - Turn
sentences
into a sequence of numbers calling.texts_to_sequences()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Split text into an array of words
words = ____.____
# Make sentences of 4 words each, moving one word at a time
sentences = []
for i in range(4, len(words)):
sentences.append(' '.join(words[i-____:i]))
# Instantiate a Tokenizer, then fit it on the sentences
tokenizer = ____
tokenizer.____(____)
# Turn sentences into a sequence of numbers
sequences = tokenizer.____(____)
print("Sentences: \n {} \n Sequences: \n {}".format(sentences[:5],sequences[:5]))