Session Ready
Exercise

Input and target dataset

In the previous exercise, you created the vocabulary, the character to integer and the reverse mapping. Now, you'll create your input and target datasets. The problem here is to generate the next character given a sequence of characters as input. So, the input data will be a sequence of characters and the target data will be the next character in the sequence. To achieve this, you'll divide the text into sequences of fixed length and for each sequence find out what is the next character.

The full text is available in text. The vocabulary, character to integer and integer to character mappings are available in variables vocabulary, char_to_idx and idx_to_char respectively. The length of each sequence is set to 40 and it is available in a variable named maxlen.

Instructions
100 XP
  • Iterate over the full text and create sequences of length maxlen and append them to input_data.
  • Iterate over the full text and get the next character for each sequence and append it to target_data.
  • Print the size of the input dataset.