Tokenizing sentences with Keras
Here you will get your hands dirty with the Keras Tokenizer
. The Keras Tokenizer
is a great utility that helps you to do some crucial text processing with a few lines of code. For example, the Keras Tokenizer
will automatically map the words in your vocabulary to IDs with a single function call. Here, you will learn about this in more detail.
You will be creating a Keras Tokenizer
object and fitting it on some text, which will allow the Tokenizer
to build a dictionary of words and their corresponding IDs. The text used to train the Tokenizer
is obtained from the Udacity Github Repo.
This exercise is part of the course
Machine Translation with Keras
Exercise instructions
- Define a Keras Tokenizer object.
- Fit the tokenizer on
en_text
. - Get the word ID for each word
w
in the given list["january", "apples", "summer"]
. - Print the word and its corresponding ID.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
from tensorflow.keras.preprocessing.text import Tokenizer
# Define a Keras Tokenizer
en_tok = ____
# Fit the tokenizer on some text
en_tok.____(____)
for w in ["january", "apples", "summer"]:
# Get the word ID of word w
id = en_tok.____[____]
# Print the word and the word ID
print(____, " has id: ", _____)