Get startedGet started for free

Understanding intents and entities

1. Understanding intents and entities

This chapter is all about the topic of NLU, or natural language understanding. NLU is a subfield of natural language processing, NLP, and is usually concerned with converting freeform text into structured data within a particular domain.

2. An example

For example, a restaurant booking bot should be able to understand a sentence like "I'm looking for a Mexican restaurant in the centre of town" and query a database or an API to find matching results. To do this, we need to identify the intent of the message, and extract a set of relevant entities.

3. Intents

An intent is a broad description of what a person is trying to say. For example, "hello", "hi", and "yoyoyo" are all ways that people might `greet` your bot. The example sentence I just mentioned could sensibly be described with the intent `restaurant_search`. There are many different ways someone might express this intent, for example: - I'm hungry - Show me good pizza spots - I want to take my boyfriend out for sushi Now, there is no universally correct way to assign intents to sentences . The 'correct' answer depends on your application. For example, if you expand your bot's capabilities so that it can actually book a table for you, the final sentence "I want to take my boyfriend out for sushi" might better be described as a `request_booking` intent than as a restaurant search.

4. Entities

The second part of the NLU problem is to extract `entities` from the text. In the restaurant search example, this means correctly identifying june tenth as a date, `sushi` as a cuisine type and `new york city` as a location. A well-studied problem in NLP is "Named Entity Recognition". This is almost exactly the same problem we are describing here, with the difference that NER usually aims to find 'universal' entities like the names of people, organizations, dates, etc. In the case of bots, you often want a narrower definition of your entities that are specific to your domain.

5. Regular expressions to recognize intents

In the next couple of exercises, you will build regular expressions for recognizing intents and entities. This is much simpler than the machine learning approaches we'll use in later parts of the chapter, and is highly computationally efficient. The main drawback is that writing and debugging regular expressions becomes really hard as your chatbot becomes more sophisticated.

6. Using regular expressions

We will use regex to look for keywords in text. We can build expressions which match any one of a set of keywords by using the pipe '|' operator. This corresponds to the logical operation OR. Remember that we can check if a string matched a pattern by checking if the returned match object is None. For example, to look for the keywords "hello", "hey", or "hi" we can write "hello|hey|hi". Notice, however, that this is just a string of characters, so "hi" will also match the words "which", "this", etc.

7. Using regular expressions

We can add the word boundary expression "\b" at the start and end to indicate that there shouldn't be any alphanumeric characters on either side of our keyword. Notice that we've put an 'r' before the start of the string. This creates a so-called raw string, which means that we can include special characters like the backslash without clashing with default python string behavior.

8. Using regex for entity recognition

If we're going to use a pattern multiple times, we can create a pattern object using the `re.compile` method. The pattern we've defined here uses some new syntax. Square brackets indicate a range of characters. As before, the asterisk means "0 or more of occurrences of this pattern", so the final part of the expression means "0 or more lower case letters". The first part of the pattern matches exactly one upper case letter. So this pattern will match any capitalized word. The findall method of the pattern object conveniently extracts all the matching substrings, so to find all the capitalized words in a sentence, we can run pattern.findall, passing the sentence as an argument.

9. Let's practice!

Now it's your turn to write some regular expressions, and use them to get intents and entities from the messages your bot receives.