1. Learn
  2. /
  3. Courses
  4. /
  5. Recurrent Neural Networks (RNN) for Language Modeling in Python

Exercise

Create vectors of sentences and next characters

This exercise aims to emphasize more the value of data preparation. You will use texts containing phrases of the character Sheldon from The Big Bang Theory TV show as input and will create vectors of sentence indexes and next characters that are needed before creating a text generation model.

The text is available in the sheldon variable, as well as the vocabulary (characters) on the vocabulary variable and the hyperparameters chars_window and step defined with values 20 and 3. This means that a sequence of 20 characters will be used to predict the next one, and the window will shift 3 characters on every iteration.

Also, the package pandas as pd is loaded in the environment.

Instructions

100 XP
  • Split the text by line break to loop through sentences.
  • Loop until the end of the sentence minus chars_window.
  • Append the portion of the sentence that has chars_window characters to the sentences variable and append the next character to the next_chars variable.
  • Use the obtained vectors to create a pd.DataFrame() and print its first rows.