Recognition of Emotional States in Natural Human-Computer Interaction
Affective and human-centered computing have attracted a lot of attention during the past years, mainly due to the abundance of environments and applications able to exploit and adapt to multimodal input from the users. The combination of facial expression
- PDF / 1,197,723 Bytes
- 35 Pages / 439.37 x 666.142 pts Page_size
- 40 Downloads / 198 Views
Recognition of Emotional States in Natural Human-Computer Interaction R. Cowie1, E. Douglas-Cowie1, K. Karpouzis2, G. Caridakis2, M. Wallace3 and S. Kollias2 1
School of Psychology, Queen’s University,University Road, Belfast, BT7 1NN, Northern Ireland, UK 2 Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, 15780, Zographou, Athens, Greece 3 Department of Computer Science, University of Indianapolis, Athens Campus, 9 Ipitou St., GR-105 57 Athens, Greece {r.cowie, e.douglas-cowie}@qub.ac.uk, [email protected], {kkarpou, gcari, skollias}@image.ntua.gr
6.1 Introduction The introduction of the term ‘affective computing’ by R. Picard [190] epitomizes the fact that computing is no longer considered a ‘number crunching’ discipline, but should be thought of as an interfacing means between humans and machines and sometimes even between humans alone. To achieve this, application design must take into account the ability of humans to provide multimodal input to computers, thus moving away from the monolithic window-mouse-pointer interface paradigm and utilizing more intuitive concepts, closer to human niches ([191, 192]). A large part of this naturalistic interaction concept is expressivity [193], both in terms of interpreting the reaction of the user to a particular event or taking into account their emotional state and adapting presentation to it, since it alleviates the learning curve for conventional interfaces and makes less technology-savvy users feel more comfortable. In this framework, both speech and facial expressions are of great importance, since they usually provide a comprehensible view of users’ reactions; actually, Cohen commented on the emergence and significance of multimodality, albeit in a slightly different human-computer interaction (HCI) domain, in [194, 195], while Oviatt [196] indicated that an interaction pattern constrained to mere ‘speak-and-point’ only makes up for a very small fraction of all spontaneous multimodal utterances in everyday HCI [197]. In the context of HCI, [198] defines a multimodal system as one that ‘responds to inputs in more than one modality or communication channel’ abundance, while Mehrabian [199] suggests that facial expressions and vocal intonations are the main means for someone to estimate a person’s affective state D. Tzovaras (Ed.) Multimodal User Interfaces. Signals and Communication Technology DOI: 10.1007/978-3-540-78345-9, © Springer 2008
119
120
6 Recognition of Emotional States in Natural Human-Computer Interaction
[200], with the face being more accurately judged, or correlating better with judgments based on full audiovisual input than on voice input ([198, 201]). This fact led to a number of approaches using video and audio to tackle emotion recognition in a multimodal manner ([202–205]), while recently the visual modality has been extended to include facial, head or body gesturing ([206, 207], extended in [208]). Additional factors that contribute to the complexity of estimating expressivity in everyday HCI are the
Data Loading...