Get startedGet started for free

Comprehending text

1. Comprehending text

In the last lesson, we used the boto3 documentation and the AWS console to explore a new service, Rekognition. Often times, data engineering requires working with unstructured text data - customer comments, other fields, tweets or posts. A huge component of data engineering is enriching data with values that will be needed for analysis downstream. In this lesson, we will look at two services that let us work with free-text data without being Natural Language Processing experts: AWS Translate and AWS Comprehend. Let's dig in!

2. AWS Translate console

Because we now know the basics of working with AWS, learning additional services is pretty straightforward. We start with finding the AWS Translate service in the console and playing with it.

3. Translating text

Next, we look up the boto3 documentation. We see that there is one main method that will do this work for us - translate_text. First, let's initialize the boto3 client for AWS Translate. No surprises here, just a regular client initialization with the service name of translate as an argument. Then, we call the translate_text method. We pass the text we want to translate in the text argument. We let Amazon detect the source language on its own by passing 'auto' for SourceLanguageCode. For our TargetLanguageCode, we pass Spanish or es. We got this language code from the boto3 documentation.

4. Translating text

In response, we get a pretty familiar looking AWS response. The translated text is in the TranslatedText key. AWS also sends back the language it detected for the source language, and the target language we specified.

5. Translating text

This structure lets us create a simple one-liner, to get the translated text directly from the response.

6. Detecting language

The way that AWS Translate knows which language is being used when we pick SourceLanguageCode auto is by using another service AWS Comprehend. We start by initializing the boto3 comprehend client. Then, we call the client's detect_dominant_language method, passing our string through the Text argument.

7. Detecting language

In the response, AWS returns a list of languages this text likely is. If it has low confidence, it might return two or more languages with confidence scores for each one.

8. Understanding sentiment

Translating is nice, but what if we wanted to detect the sentiment of a set of text? This could be useful for processing tweets and customer comments. In the console, we can see that sentiment can be Neutral, Positive, Negative or mixed.

9. Understanding sentiment

We have already initialized the boto3 comprehend client. From the documentation, we know that we can use the client's detect_sentiment method to analyze the text. We call the method, storing the API response in the response variable.

10. Understanding sentiment

The response will contain a sentiment key and a dictionary with the sentiment scoring.

11. Understanding sentiment

Which means, again, our one-liner could access the Sentiment key directly from the response.

12. Review

In this lesson, we went on a tour of natural language processing in AWS. AWS Translate and Comprehend make it really easy and smooth to perform basic NLP on free text, allowing data engineers to enrich their data with sentiment and language information. To perform a translation, we initialize the boto3 translate client and call translate_text. If we don't know the source language, that's ok - AWS will figure that out for us if we pass auto for SourceLanguageCode.

13. Review

AWS figures out the source language by using AWS comprehend in the background. We can do that too by initializing the AWS Comprehend boto3 client. Then, we call the detect_dominant_language method to figure out what language a chunk of text is.

14. Review

We can also use AWS to detect sentiment using comprehend's detect_sentiment method.

15. Let's practice!

Now that we have some serious NLP in our pocket, let's practice!