1. Creating customer call transcripts
Welcome to the case study chapter! Let's now apply everything we're learned so far to a real-life application.
2. Case study introduction
Imagine you're an AI engineer at DataCamp. The support team wants to trial allowing users to submit support queries with voice messages. They would like you to build a chatbot that will interpret these voice messages, and provide a spoken response back to the user. Due to DataCamp's international learner base, this system must also support a wide array of languages.
Let's break this down!
3. Case study introduction
The chatbot should take these recordings, transcribe
4. Case study introduction
them into text, detect
5. Case study introduction
the language, translate
6. Case study introduction
it into English, generate
7. Case study introduction
a response, and then reply
8. Case study introduction
with spoken audio in the customer's native language.
It also needs a moderation
9. Case study introduction
system to filter out irrelevant messages and ensure polite responses before sending them back to the user.
That sounds like a complex system, so let's break it down into steps.
10. Case study plan
In this video, we'll focus on getting an accurate English transcript.
First, we'll transcribe the audio into text.
Then, we'll detect the language used and translate it into English.
Finally, we'll refine the translated text to correct any misinterpretations, especially with names and terminology.
Let's get started.
11. Step 1: transcribe audio
We've been given an audio recording in mp3 format. To process it, we first open the file in read-binary mode, using the "rb" parameter.
Next, we make a request to the OpenAI audio endpoint for a transcription, specifying the model and passing our audio file.
12. Step 1: transcribe audio
We extract the transcript using the text attribute and print it. Right away, we see that it's not in English.
13. Step 2: detect language
To determine the language, we send a chat completion request, prompting the model to identify the transcript's language and return only the country code. We provide a few examples of country codes, just so there's no misinterpretation in what we're requesting, and pass the transcript variable into the prompt using an f-string.
The model return 'uk', which is the country code of Ukraine. Now that we know the language, we can move onto translation.
14. Step 3: translate to English
To translate the text into English, we send another chat completion request. This time, we ask the model to translate the transcript while specifying the detected language.
15. Step 3: translate to English
Reading the translation, we see that the customer is asking for learning recommendations. However, we also notice that some technical terms have been misinterpreted.
16. Step 3: translate to English
The model did not correctly recognize names like DataCamp, or technologies like LangChain and AWS, which could lead to confusion in the response. We need to fix this.
17. Step 4: refining the text
We can send another chat completion request; this time, asking the model to refine the transcript by fixing the terminology. We pass the translated text again and let the model adjust it.
18. Step 4: refining the text
After printing the corrected text, we see that the names have been properly recognized. Now, the customer's request is clear and ready for processing.
19. Recap
Let's recap what we've done. We transcribed the audio to extract the customer's message, detected the language and translated the text into English; finally, we refined the output to correct any misunderstandings related to names and terminology. In total, we made four requests to the OpenAI API to complete these steps.
20. Time for practice!
Now it's your turn!