Frontiers of dialogue research

1. Frontiers of dialogue technology

In this final video I want to talk about some more advanced machine learning techniques for building conversational software. Many of these have been used in applications, but they haven't been adopted widely for building chatbots yet. I believe that's because it still requires a lot of expertise to make use of them, and because in most cases they require a large amount of training data relevant to your domain.

2. A neural conversational model

There's a now-famous 2015 paper from Vinyals and Le called "A Neural Conversational Model" which uses the sequence to sequence neural network architecture to build a conversational agent.

3. Seq2seq

Seq2seq models are applicable to many different problems, but are best known for their success in machine translation systems. They can be applied with minimal modification to conversation modeling. The main idea is that the neural network reads an input sequence (in this case a user message) and as it does so, incrementally builds up a 'hidden' vector representing the *meaning* of that sentence. The second half of the neural network runs the reverse operation, taking the 'hidden' vector as input and incrementally generating the output sequence. In the machine translation case, the input and output sequences would be a sentence an English and it's French translation, for example. In the case of conversation modeling, it's the input message and the response. This approach is totally different from anything you've built in this course. It's compelling because it doesn't require us to define intents or entities or to write a bunch of responses; it's completely data driven. The disadvantages, for now, are that it requires a lot of data, there's no guarantee that the output is going to be relevant or appropriate, and it's not easy to integrate API calls & other external logic.

4. Grounded dialogue systems

So far we've seen two extremes of how to handle dialogue. The systems you've built in this course have been completely hand-crafted, and the seq2seq models we just saw are completely data driven. The vast majority of research into dialogue systems from the last decades sits in between these two extremes. Machine-learning based dialog systems usually consist of four components, NLU, a dialogue state manager, API logic, and a natural language response generator. The API logic connects the actions to the bot to the real world, hence the conversations are 'grounded' in the knowledge contained in a database, knowledge base, or API. In most cases, the set of actions the bot can take is still hand-crafted, but the decision of which actions to execute (i.e. the dialogue policy) is learned. Both supervised and reinforcement learning can be used to train these systems. To gather example conversations for supervised learning, it's easiest to have a human pretend to be the bot. This is also known as the Wizard of Oz technique. Once the trained policy has the basics right, it can be refined using reinforcement learning, where it receives a reward for each successful conversation, and uses that signal to improve over time.

5. Language generation

The final topic I want to talk about is language generation with neural networks. For practical reasons, if you're building a bot I really don't recommend you use this. It's much, much simpler to write out 5 different ways of saying "Here are some results which match your query" than to train a neural net to come up with grammatically correct, culturally appropriate responses that include the right information. However, this is a fun topic and an active area of interest. In the final exercise you'll have access to a pre-trained neural network which can generate text. It was trained on the scripts of every episode of the Simpsons, so it should have some interesting behavior.

6. Generating sample text

The final topic I want to talk about is language generation with neural networks. For practical reasons, if you're building a bot I really don't recommend you use this. It's much, much simpler to write out 5 different ways of saying "Here are some results which match your query" than to train a neural net to come up with grammatically correct, culturally appropriate responses that include the right information. However, this is a fun topic and an active area of interest. In the final exercise you'll have access to a pre-trained neural network which can generate text. It was trained on the scripts of every episode of the Simpsons, so it should have some interesting behavior. To generate text, we can use the `sample_text` function. The most interesting parameter is the 'temperature', which controls how risky the network is when generating text. At very low temperatures, it just repeats the most common combinations of letters, and at very high temperatures it generates complete gibberish.