Part 1: Exploring the to_categorical() function
Did you know that in real-world problems, the vocabulary size can grow very large (e.g. more than hundred thousand)?
This exercise is broken into two parts and you will learn the importance of setting the num_classes
argument of the to_categorical()
function. In part 1, you will implement the function compute_onehot_length()
that generates one-hot vectors for a given list of words and computes the length of those vectors.
The to_categorical()
function has already been imported.
This exercise is part of the course
Machine Translation with Keras
Exercise instructions
- Create word IDs by using
words
andword2index
incompute_onehot_length()
. - Create onehot vectors using the
to_categorical()
function using the word IDs. - Return the length of a single onehot vector using the
<array>.shape
syntax. - Compute and print the length of onehot vectors using
compute_onehot_length()
for the list of wordsHe
,drank
,milk
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def compute_onehot_length(words, word2index):
# Create word IDs for words
word_ids = [____[w] for w in ____]
# Convert word IDs to onehot vectors
onehot = ____(____)
# Return the length of a single one-hot vector
return onehot.____[1]
word2index = {"He":0, "drank": 1, "milk": 2}
# Compute and print onehot length of a list of words
print(____([____,____,____], ____))