eXplainable Cooperative Machine Learning with NOVA
- PDF / 4,537,586 Bytes
- 22 Pages / 595.276 x 790.866 pts Page_size
- 81 Downloads / 263 Views
TECHNICAL CONTRIBUTION
eXplainable Cooperative Machine Learning with NOVA Tobias Baur1 · Alexander Heimerl1 · Florian Lingenfelser1 · Johannes Wagner1 · Michel F. Valstar2 · Björn Schuller1 · Elisabeth André1 Received: 30 September 2019 / Accepted: 2 January 2020 © The Author(s) 2020
Abstract In the following article, we introduce a novel workflow, which we subsume under the term “explainable cooperative machine learning” and show its practical application in a data annotation and model training tool called NOVA. The main idea of our approach is to interactively incorporate the ‘human in the loop’ when training classification models from annotated data. In particular, NOVA offers a collaborative annotation backend where multiple annotators join their workforce. A main aspect is the possibility of applying semi-supervised active learning techniques already during the annotation process by giving the possibility to pre-label data automatically, resulting in a drastic acceleration of the annotation process. Furthermore, the userinterface implements recent eXplainable AI techniques to provide users with both, a confidence value of the automatically predicted annotations, as well as visual explanation. We show in an use-case evaluation that our workflow is able to speed up the annotation process, and further argue that by providing additional visual explanations annotators get to understand the decision making process as well as the trustworthiness of their trained machine learning models. Keywords Annotation · Cooperative machine learning · Explainable AI
1 Motivation In various research disciplines (Behavioural Psychology, Medicine, Anthropology,...) the annotation of social behaviours is a common task. This process includes manually identifying relevant behaviour patterns in audio-visual material and assigning descriptive labels. Generally speaking, segments in the signals are mapped onto a set of discrete classes, e.g., a certain type of gesture, a social situation (e.g., conflict), or the emotional state of a person. In Affective Computing, a subset of these events—the so called social signals—are used to augment the spoken part of a message with non-verbal information to enable a more natural human–computer interaction. [54, 55]1. To automatically detect social signals from raw sensory input (e.g., speech signals) machine learning (ML) can be applied. That is, sensory input is transformed into a compact set of relevant features and a classifier is trained on manually labelled examples to optimise a learning function. Once * Tobias Baur baur@hcm‑lab.de 1
Augsburg University, Universitätstr. 6a, Augsburg, Germany
University of Nottingham, Nottingham, UK
2
trained, the classifier can be used to automatically predict labels on unseen data. However, since humans transmit non-verbal messages through a number of channels (voice, face, gestures, etc.) and due to the complex interplay between these channels (think, for instance, of a faked versus a real smile, which depends on subtle contractions of the mus
Data Loading...