A Reasoning Component for Information-Seeking and Planning Dialogues
Motivated by the need to make the human-machine information-seeking dialogue as efficient and user-friendly as possible we propose a logic-based reasoning component for a Spoken Language Dialogue Systems architecture. This component, called Problem Assist
- PDF / 34,049,173 Bytes
- 423 Pages / 432.812 x 662.821 pts Page_size
- 20 Downloads / 156 Views
Text, Speech and Language Technology VOLUME 28
Series Editors Nancy Ide, Vassar College, New York Jean Véronis, Université de Provence and CNRS, France Editorial Board Harald Baayen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth W. Church, AT & T Bell Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T. Barnard, University ofRegina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim Llisterri, Universitat Autonma de Barcelona, Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, LIMSI-CNRS, France
The titles published in this series are listed at the end of this volume.
Spoken Multimodal Human-Computer Dialogue in Mobile Environments Edited by W. Minker University of Ulm, Germany
Dirk ü h l e r University of Ulm, Germany and
LailaDybkjræ University of Southern Denmark, Odense, Denmark
Iy
_
dl = d ; >
_
dl
I t =
d ; >
dx u
= t t >
dy v
=
Tt
,„ ,. (3 4)
-
u(x, y) and v(x, y) correspond to horizontal and vertical elements of the optical flow at a point (x, y). Since we cannot determine u(x, y) and v(x, y) using only the Equation (3.3), we incorporate another restraint called the "smoothness constraint": / /
2 2 u ) + (vx + vy )} dxdy - • min
(3.5)
40
SPOKEN MULTIMODAL HUMAN-COMPUTER DIALOGUE
where uX9uy, vx and vy denote partial derivatives of u and v with respect to x and y, respectively. This constraint means that the square of the magnitude of the gradient of u(x, y) and v(x,y) at each point must be minimised. In other words, the optical flow velocity changes smoothly between every neighbouring two pixels in an image. The optical flow vectors u(x,y) and v(x,y) are computed under these two constraints (3.3) and (3.5): 2 2 2 2 2 / / {{ux + uy ) + (vx + vy ) + IL{IXU + Iyv + It) } dxdy -> min (3.6)
where fi is a weighting factor, which is set at 0.01 throughout our experiments. This minimisation can be accomplished by a variational approach with iterations: JUj_i oi"/' a
x,y
—
"~
_U ii"* a
x,y
X*-"X it
^
1 •
y l°L
l" ^-y^X
/
T
9. .
?
y "i "
T
9\
-"-t / 1
x
V.
where u^y is the fc-th estimated horizontal optical flow vector at (x, y), and u^y is the fc-th estimated average of u* y at the neighbouring pixels. v%y and v^y are the corresponding vertical components. An example of the optical flow analysis is shown in Figure 1. The left image (a) is extracted from a video sequence at a certain time, and the right image (b) is the next picture. Computed optical flow velocities are shown at every 8 pixels in (c). The horizontal axis indicates the x coordinate in an image, and the vertical axis indicates the y coordinate. Each line or dot in this figure illustrates the amount and direction of optical flow velocity at the corresponding position in the images (a) and (b).
3. 3.1
A Multimodal Speech Recognition System Feature Extraction and Fusion
Figure 2 shows the structure of our audio-visual multimodal speech recognition system. Speech signals are recorded at a 16kHz sampling rate, and a speech frame