Part 1: Exploring the to_categorical() function

Did you know that in real-world problems, the vocabulary size can grow very large (e.g. more than hundred thousand)?

This exercise is broken into two parts and you will learn the importance of setting the num_classes argument of the to_categorical() function. In part 1, you will implement the function compute_onehot_length() that generates one-hot vectors for a given list of words and computes the length of those vectors.

The to_categorical() function has already been imported.

This exercise is part of the course

Machine Translation with Keras

View Course

Exercise instructions

  • Create word IDs by using words and word2index in compute_onehot_length().
  • Create onehot vectors using the to_categorical() function using the word IDs.
  • Return the length of a single onehot vector using the <array>.shape syntax.
  • Compute and print the length of onehot vectors using compute_onehot_length() for the list of words He, drank, milk.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

def compute_onehot_length(words, word2index):
  # Create word IDs for words
  word_ids = [____[w] for w in ____]
  # Convert word IDs to onehot vectors
  onehot = ____(____)
  # Return the length of a single one-hot vector
  return onehot.____[1]

word2index = {"He":0, "drank": 1, "milk": 2}
# Compute and print onehot length of a list of words
print(____([____,____,____], ____))