1. Text-to-speech (TTS)
So we've done speech-to-text, how about we flip the script and go from text to speech!
2. Text-to-speech
Text-to-speech is available in many internet browsers, mobile apps, and accessibility devices.
These are models trained to take a text input and output realistic human narration.
Text-to-speech models are commonly used for making text content more accessible to those with visual impairments or reading difficulties.
3. Text-to-speech with OpenAI
OpenAI's text-to-speech models are available at the audio endpoint, just like speech-to-text. However, we need to specify speech.create() instead of transcriptions.create().
We specify our choice of model and the voice we would like to use from OpenAI's supported list. Check out the documentation linked to experiment with these different voices.
Finally, we provide the text input to convert into speech.
The response from this model is streamed in real-time, so to save it to a file, we use the .stream_to_file() method on the response, specifying the file name.
We created an MP3 file here, but we could have also specified the response_format argument of the .create() method to change this to another supported format.
Let's give this a listen!
4. Onyx
"Creating human-like speech is now possible with just a few lines of code. Pretty neat, right?"
Gone are the days of robotic voices. Did you notice the speech cadence, the pauses for punctuation, and even the voice inflection for the question at the end of the sentence?
To showcase this a little further, I'll actually pass it back over to the Onyx voice to lead the final part of this video.
5. OpenAI TTS
Although OpenAI's text-to-speech models have been optimized for English, they are also functional for a number of other languages.
For example, Spanish ("Me gusta mucho pasear por el parque cuando hace buen tiempo") and German ("Ich gehe sehr gern im Park spazieren, wenn das Wetter schön ist."), which both mean "I really enjoy walking in the park when the weather is nice".
The accent often isn't perfect, but it's still understandable for native speakers.
6. Let's practice!
Thanks Onyx, time for you to give these models a try!