Get startedGet started for free

Text processing with regular expressions

1. Text processing with regular expressions

Regular expressions are a powerful tool for processing text.

2. Regular expressions

We will use them for matching messages against known patterns, for extracting key phrases, and for transforming sentences grammatically. These are the core pieces we need to create our ELIZA style bot.

3. The regex behind ELIZA

Much of the magic of the ELIZA system relied on giving the _impression_ that the bot had understood you, even though the underlying logic was extremely simple. For example, asking ELIZA "do you remember when you ate strawberries in the garden?", she would respond: "How could I forget when I ate strawberries in the garden?". Part of what makes this example so compelling is the subject. We are asking about _memories_, which we associate with our conscious minds and our sense of self. The memory itself, of eating strawberries in the garden, invokes powerful emotions. But if we pick apart how the response is generated, we see that it's actually quite simple.

4. Pattern matching

To build an ELIZA-like system you need a few key components. The first is a simple pattern matcher. This consist of a set of rules for matching user messages, like "do you remember x" To match patterns we use a technology called *regular expressions*, to use these in python we `import re`. Regular expressions are a way to define patterns of characters, and then seeing if those patterns occur in a string. In regular expressions, the dot character is special, and matches *any* character. The asterisk means "match 0 or more occurrences of this pattern", so "dot star" is basically a catch-all, it says match any string of characters. We can check whether a message matches a pattern by calling re dot search brackets pattern comma message. This returns a match object. If the string doesn't match the pattern, the match object will be `None`, so we can check if the string matches using a simple if statement.

5. Extracting key phrases

Adding parentheses in the pattern string defines a `group`. A group is just a substring that we can retrieve after matching the string against the pattern. We use the match object's `group` method to retrieve the parts of the string that matched. The default group, with index 0, is the whole string. The group with index one is the group we defined by including the parentheses in the pattern.

6. Grammatical transformation

To make responses grammatically coherent, we will want to transform the extracted phrases from first to second person and vice versa. In English, conjugating verbs is easy, and simply swapping "I" and "you", "my" and "your" works in most cases. We can use another function from the `re` module for this: `re.sub`. This substitutes patterns in a string. For example, take the sentence "I walk my dog". re.sub "You walk your dog".

7. Putting it all together

The final step is to combine these logical pieces together. We start with a pattern and a message. We extract the key phrase by creating a match object using pattern dot search, and then use the group method to extract the string represented by the parentheses. We then choose a response appropriate to this pattern, and swap the pronouns so that the phrase makes sense when the bot says it.

8. Putting it all together

We then insert the extracted phrase into the response, to partially echo back what the user talked about, giving the illusion that the bot has understood the question and remembers this experience.

9. Let's practice!

Now it's your turn to build your own eliza style chatbot.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.