Get startedGet started for free

The Text Generating Function

1. The Text Generating Function

In this lesson, we are going to focus on how to generate phrases or custom texts given a trained model. In some applications we want to generate only one sentence, while in others we may be interested in a whole paragraph. An entire book, even. So, let's see how to create rules for the generated text.

2. Generating sentences

To generate sentences, we need first to define what is a sentence so that we can create rules to generate them. One definition is to use punctuation to determine when the sentence ends. For example, we can use period, exclamation or question marks. If we use this approach, we must make sure that those punctuations are present in the vocabulary. Another definition is to use special tokens for the beginning and end of a sentence. For example, use token sent to start the sentence and slash sent to represent its end. This approach needs the training data to be pre-processed to insert this tokens on all sentences before training.

3. Generating sentences

To generate a sentence using period to define its end, we first initialize an empty string that will contain the sentence. Then we loop until the model predicts a period. We get the next prediction of the model using the method dot predict. The predict method returns an array with one element containing the predictions. We use the first position only because we will generate one character per step, and use it to predict the next one. Since the model returns a vector with the size of the vocabulary, where each element has a probability of the next character to be that element, we get the one with higher value. Then we use the dictionary to transform the index to the corresponding character. Finally, we concatenate the next character on the sentence. After this, we need to update the variable X so that the next prediction will not be the same. To generate paragraphs or other kind of text blocks, you can just change the conditions of the while loop.

4. Probability scaling

One way to change the probability distribution over the vocabulary size is to use the scaling factor temperature. This name comes from physics, but all it does is to make the distribution smoother or not. It has positive values, and the closer to zero it gets, the model will emphasize the class with highest probability. This implies that the temperature will increase the probability of the class with higher probability and decrease the others. When the temperature is equal to one, then there is no scaling on the softmax function. For higher values of temperature, the model start to use different words, because it makes the probability distribution smoother. This means that the higher the value of temperature, the more equal the probabilities become. This leads to more creative outputs using words other than the expected one, but also leads to more mistakes. The temperature is a hyperparameter, meaning that you can try different values to see how it changes the predictions or leave it with the default value of one.

5. Probability scaling

The scaling function is relatively simple. It has two parameters: softmax_pred that contains the probability values returned by the softmax function on the last layer of the model and temperature value defaulted to one. It takes the logarithm of the softmax output and divide each element in the vector by the temperature value. Then it re-applies the exponential function. Next, divide the obtained values by the sum of them to obtain the scaled distribution with sum equal to 1. The next step is to run one simulation of a multinomial distribution using as probability of each class the values obtained in the previous step. Finally, we return the class that was obtained on the simulation of the multinomial, not necessarily the one with higher probability.

6. Let's practice!

Those are the steps that guarantee a good practice for text generation. Let's put that to practice!