1. Sentiment analysis on spoken language text
Now you've got some helper functions ready, it's time to start extracting information from the transcribed text. Your proposal to Acme Studios suggested sentiment analysis, the process of figuring out if text is positive, neutral or negative, would be helpful and they agreed.
Knowing the sentiment of different calls may help them figure out where customers are having the most trouble.
To do sentiment analysis, you decide on using the NLTK Python library.
2. Installing sentiment analysis libraries
To begin, you install NLTK using pip. Then you download the neceesary NLTK packages for sentiment analysis, punkt and vader lexicon using NLTK's download function.
Since we don't have a large enough dataset to train our own sentiment analysis model, we'll use NLTK's VADER or Valance Aware Dictionary and sEntiment analyzeR as it has a pretrained sentiment analysis model in it.
VADER works by analyzing each word in a piece of text and giving it a sentiment score. It was pretrained on social media text passages but will lend itself well for our proof of concept.
3. Sentiment analysis with VADER
To start sentiment analysis, you import the SentimentIntensityAnalyzer class from the nltk sentiment vader module. And then instantiate an instance of SentimentIntensityAnalyzer and save it to the commonly named variable sid.
You can then find the sentiment scores of a piece of text by calling polarity scores on sid and passing it a string.
Running the function will return four values, neg for negative, neu for neutral, pos for positive and compound as an overall. The more negative a piece of text is, the higher the negative score will be and the same goes for the positive score if the text is positive.
If it's in the middle, neutral will be higher. And the compound value can be thought of as the overall score with -1 being most negative and positive 1 being most positive.
4. Sentiment analysis on transcribed text
You try out the sentiment analysis on one of your transcribed phone calls using only the customer channel.
Reading the transcription and comparing it to what you hear when you listen to the audio file, you can see it's not perfect. But you can see the sentiment scores are leaning in the right direction. The sentiment is fairly neutral since the customer hasn't received their product yet.
5. Sentence by sentence
From your experience with sentiment analysis, you know the sentiment can change sentence by sentence. But your current transcription function doesn't return sentences, only a large block of text.
In your proposal, you mentioned this to Acme and they allocated budget for you to try a paid transcription API. You try transcribing the same audio files using a paid API service and find it returns sentences.
Using NLTK's sent tokenize, you break the transcription into sentences and analyze the sentiment sentence by sentence.
6. Sentence by sentence
This is helpful because it allows you to figure out which parts of the conversation the customer may be most displeased with.
You can see the line where the transcription says this service is terrible gets a negative compound score.
7. Time to code!
It's still early, but you're starting to see some insights you can report back to Acme. Let's code!