Get startedGet started for free

Robust language understanding with rasa NLU

1. Robust NLU with Rasa

In the next exercises, we will use a library called Rasa NLU, or natural language understanding. I am one of the creators and maintainers of this library.

2. Rasa NLU

Rasa NLU provides a high-level API for intent recognition and entity extraction, and has a number of useful components already built in which are useful for building chatbots.

3. Rasa data format

To use rasa we provide training data in a json file. JSON is a popular human-readable data format based on key-value pairs. We import the load_data function and call it, passing the path to the training data file as an argument to create training data object. This object contains a list of dictionaries called training examples. Each of these dictionaries contains an example message, its intent, and a list of entities found in the message. We can convert one of these dictionaries to readable json using the json.dumps function.indent=2 specifies the number of spaces to indent

4. Interpreters

The way to use rasa in python code is through an interpreter object. This contains your trained model for intents and entities. To use it, pass a message to the interpreter's parse message. This returns a dictionary with the extracted intent and entities. Now let's see how we can create an interpreter.

5. Rasa usage

To train our model we create a configuration and a trainer: First we import the rasanluconfig and trainer from rasa nlu. We create a config object by calling RasaNLUConfig with a dictionary of parameters as the cmdline_args argument. This dict contains a pipeline key, which we'll explain in the next slide. To create the trainer we call Trainer with the config as its sole argument. We then call the trainer dot train method, passing it the training data as an argument. When the model is trained, this returns our interpreter object.

6. Rasa pipelines

In the previous slide, we used the `spacy_sklearn` pipeline. A Rasa pipelne is a list of components that will be used to process text. The nlp spacy component initializes the spacy English model, then the ner crf component uses a conditional random field entity recognizer, which you'll see in the next slide. The ner_synonyms component maps entities with the same meaning to the same key, for example, if we want to treat NYC and New York City as synonyms. The `intent_featurizer_spacy` component, creates vector representations of sentences, exactly as you did in earlier exercises, by using spaCy's word vectors. And the `intent_classifier_spacy` component is a scikit-learn support vector classifier. When the model is trained and used with this pipeline, these steps are performed automatically. When creating a rasa config object, you can either specify the name of a pre-defined pipeline, or pass a list of the components you want to use.

7. Conditional random fields

In order to train a custom entity recognizer for your domain, the recommended component is ner crf, which uses conditional random fields. CRFs are a machine learning model, we won't go into the details of it, but it works well for entity recognition when you have a small training dataset.

8. Handling typos

One downside of relying on word vectors is that if important words are misspelled, it can be very hard for the classifier to correctly predict intents. And for many words, there simply aren't any word vectors, because they didn't appear in the training corpus. Rasa NLU can remedy this if you include the intent featurizer ngrams component. This component looks at all the words in your training data for which there aren't word vectors (including misspellings), and looks for sub-word units, or character ngrams, which are predictive of the intent. For example, if the word dollars is an important indicator of a price request, this component will pick up the sequence d o l l a as an important one. You do of course have to make sure that your training data contains these out of vocabulary words and misspellings, or else the model won't be able to learn from them.

9. Let's practice!