Tracking multimodal communication in humans and agents
Funded by the National Science Foundation
|
We have made a movie in which our agent explains what the project is about.
|
![]() |
When we talk to somebody we are involved in a rich complex of communicative activity, facial expressions, hand gestures, direction of gaze, to name but the most obvious ones. The interpretation of "are you hungry" depends on the context (e.g. just before going to a restaurant, during dinner), depends on prosody (e.g. stress on you or hungry), facial expressions (e.g. brows raised, brows and gaze both raised) and gestures (e.g. rubbing stomach, pointing at a restaurant). We know that hearers adapt to the speaker (e.g. maintaining the theme of the conversation, smiling etc.). Research into the interaction of these channels is however limited, often focusing on the interaction between a pair of channels.
The iMAP (Intelligent MapTask Agent) project investigates multimodal communication
in humans and agents, focusing on two linguistic modalities - prosody and dialog
structure, which reflect major communicative events, and two non-linguistic
modalities - eye gaze and facial expressions. It aims to determine
1. which of the non-linguistic modalities align with events marked by prosody and dialogue structure, and with one another;
2. whether, and if so when, these modalities are observed by the interlocutor;
3. whether the correct use of these channels actually aids the interlocutors comprehension.
Answers to these questions should provide a better understanding of the use of communicative resources in discourse and can subsequently aid the development of more effective animated conversational agents. The research resulting from this project will benefit a large variety of fields, including cognitive science, computational linguistics, artificial intelligence, and computer science. In addition, the integration of the modalities into a working model will advance the development and use of intelligent conversational systems.