Part 2: Exploring the to_categorical() function

In part 1, you implemented the compute_onehot_length() function which did not use the num_classes argument while computing onehot vectors.

The num_classes argument controls the length of the one-hot encoded vectors produced by the to_categorical() function. You will see that in situations where you have two different corpora (i.e. collections of texts) with different vocabularies, leaving the num_classes undefined can result in one-hot vectors of varying length.

For this exercise, the compute_onehot_length() function and the word2index dictionary have been provided.

This exercise is part of the course

Machine Translation with Keras

View Course

Exercise instructions

  • Call compute_onehot_length() on words_1.
  • Call compute_onehot_length() on words_2.
  • Print the lengths of one-hot vectors obtained for words_1 and words_2.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

words_1 = ["I", "like", "cats", "We", "like", "dogs", "He", "hates", "rabbits"]
# Call compute_onehot_length on words_1
length_1 = ____(____, ____)

words_2 = ["I", "like", "cats", "We", "like", "dogs", "We", "like", "cats"]
# Call compute_onehot_length on words_2
length_2 = ____(____, ____)

# Print length_1 and length_2
print("length_1 =>", ____, " and length_2 => ", ____)