Part 2: Exploring the to_categorical() function
In part 1, you implemented the compute_onehot_length()
function which did not use the num_classes
argument while computing onehot vectors.
The num_classes
argument controls the length of the one-hot encoded vectors produced by the to_categorical()
function. You will see that in situations where you have two different corpora (i.e. collections of texts) with different vocabularies, leaving the num_classes
undefined can result in one-hot vectors of varying length.
For this exercise, the compute_onehot_length()
function and the word2index
dictionary have been provided.
This exercise is part of the course
Machine Translation with Keras
Exercise instructions
- Call
compute_onehot_length()
onwords_1
. - Call
compute_onehot_length()
onwords_2
. - Print the lengths of one-hot vectors obtained for
words_1
andwords_2
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
words_1 = ["I", "like", "cats", "We", "like", "dogs", "He", "hates", "rabbits"]
# Call compute_onehot_length on words_1
length_1 = ____(____, ____)
words_2 = ["I", "like", "cats", "We", "like", "dogs", "We", "like", "cats"]
# Call compute_onehot_length on words_2
length_2 = ____(____, ____)
# Print length_1 and length_2
print("length_1 =>", ____, " and length_2 => ", ____)