Annotating Reference and Coreference In Dialogue Using Conversational Agents in games

A project funded by the EPSRC, grant number EP/W001632/1

The development of modern neural network architectures architectures such as the encoder/decoder model and the Transformer has brought about an explosion of interest in neural models for AI systems able to engage in conversations (aka conversational agents), reflected by a spike of published work, dedicated workshops, and industry-sponsored competitions and grants. While at first these models were applied to simple chatbots, the focus of research has been shifting towards conversational agents capable of engaging in more complex and task-oriented dialogue such as restaurant booking or question answering. But the results on these tasks show that while end-to-end architectures without dedicated models for semantic interpretation can work well for chatbots, conversational agents carrying out more complex tasks require greater ablity to handle such aspects of interpretation, and some form of modelling of context.

Among the aspects of natural language interpretation that require more advanced architectures are COREFERENCE and REFERENCE. For an example of the importance of coreference in dialog, consider the following except from a real-life chat conversation, where both participants continually use anaphoric expressions such as BOTH, THEY, IT, etc to refer to previously introduced entities such as Google or Microsoft.

  • A:Are you a fan of Google or Microsoft?
  • B:Both are excellent technology they are helpful in many ways. For the security purpose both are super.
  • A:I'm not a huge fan of Google, but I use it a lot because I have to. I think they are a monopoly in some sense.
  • B:Google provides online related services and products, which includes search engine and cloud computing.
  • A:Yeah, their services are good. I'm just not a fan of intrusive they can be on our personal lives

Enriching conversational agents with the ability to carry out these forms of interpretation raises two issues. First, developing models for these tasks requires specific training data: most deep-learning architectures are trained on large amounts of freely available written text. Training a coreference resolver on written text and domain-adapting it to dialogue however has proven ineffective as coreference in dialogue involves different phenomena and is more involved than coreference in text. Second, the developed architectures require specific modules that enable them to interpret coreference and reference. Our group has pioneered the use of Games-With-A-Purpose (GWAPs) to collect data for NLP, resulting in the largest NLP dataset collected using GWAPs or indeed crowdsourcing. But there is a fundamental difference between conversation and written text: the latter is designed to be read by third parties, whereas research has shown that overhearers to a conversation only acquire a partial understanding of what was said.

OUR PROPOSED SOLUTION to the problem of creating large annotated datasets of coreference and reference interpretation in conversation is to collect the judgments for anaphoric and referential information via GAMES IN WHICH CONVERSATIONAL AGENTS INTERACT WITH HUMAN PLAYERS AND EVOLVE BY ACQUIRING INFORMATION FROM THEM. This idea builds on recent work by Facebook and Microsoft, among others, that pioneered the use of conversational agents in games to collect data about dialogue, and of Hockenmaier and her lab. Our agents will be deployed in gaming platforms such as LIGHT and MINECRAFT in collaboration with these labs. But whereas in previous work conversational agents only interact with the aim to improve their end-to-end behavior, in the proposed project we will develop artificial agents able to improve their ability to interpret coreference and reference by collecting judgments about these interpretation aspects via CLARIFICATION QUESTIONS to the players at appropriate moments, which can also be used to annotate a dataset.