Organizing the Workshop on the Representation, Sharing & Evalution of Multimodal Interactions (June 13, 2022)

Organizing the Workshop on the Representation, Sharing and Evaluation of Multimodal Agent Interaction (MMAI2022) at the first International Conference on Hybrid Human-Artificial Intelligence (, June 13, 2022.

HHAI2022 is organised by the Dutch Hybrid Intelligence Center and the European HumaneAI Network, as the first conference in what we intend to become a series of conferences about Hybrid Human Artificial Intelligence.

Interaction is a real world event that takes place in time and physical or virtual space. By definition, it only exists when it happens. This makes it difficult to observe and study interactions, to share interaction data, to replicate or reproduce them and to evaluate agent behaviour in an objective way. Interactions are also extremely complex, covering many variables whose values change from case to case. The physical circumstances are different, the participants are different, and past experiences have an impact on the actual event. Besides, the eye(s) of the camera(s) and/or experimenters are another factor with impact and the man-power needed to capture such data is high. Finally, privacy issues make it difficult to simply record and publish interaction data freely.

It is therefore not a surprise that interaction research progresses slowly. This workshop aims to bring together researchers with different research backgrounds to explore how interaction research can become more standardised and scalable. The goal of this workshop is to explore how researchers and developers can share experiments and data in which multimodal agent interaction plays a role and how these interactions can be compared and evaluated. Especially within real-world physical contexts, modelling and representing situations and contexts for effective interactions is a challenge. We therefore invite researchers and developers to share with us how and why you record multimodal interactions, whether your data can be shared or combined with other data, how systems can be trained and tested and how interaction can be replicated. Machine learning communities like vision and NLP have made a lot of fast progress by creating competitive leaderboards based on benchmark datasets. But although this is great for training unimodal perception models, obviously such datasets are not sufficient for research involving interaction where multiple modalities should be considered.