Segmenting Route Descriptions for Mobile Devices
The provision of information on mobile devices introduces interesting challenges. The most obvious of these is that ways have to be found of optimising the use of the limited available space; however, we also have to take account of the fact that, unlike
 PDF / 34,049,173 Bytes
 423 Pages / 432.812 x 662.821 pts Page_size
 36 Downloads / 182 Views
Text, Speech and Language Technology VOLUME 28
Series Editors Nancy Ide, Vassar College, New York Jean Véronis, Université de Provence and CNRS, France Editorial Board Harald Baayen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth W. Church, AT & T Bell Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T. Barnard, University ofRegina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim Llisterri, Universitat Autonma de Barcelona, Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, LIMSICNRS, France
The titles published in this series are listed at the end of this volume.
Spoken Multimodal HumanComputer Dialogue in Mobile Environments Edited by W. Minker University of Ulm, Germany
Dirk ü h l e r University of Ulm, Germany and
LailaDybkjræ University of Southern Denmark, Odense, Denmark
Iy
_
dl = d ; >
_
dl
I t =
d ; >
dx u
= t t >
dy v
=
Tt
,„ ,. (3 4)

u(x, y) and v(x, y) correspond to horizontal and vertical elements of the optical flow at a point (x, y). Since we cannot determine u(x, y) and v(x, y) using only the Equation (3.3), we incorporate another restraint called the "smoothness constraint": / /
2 2 u ) + (vx + vy )} dxdy  • min
(3.5)
40
SPOKEN MULTIMODAL HUMANCOMPUTER DIALOGUE
where uX9uy, vx and vy denote partial derivatives of u and v with respect to x and y, respectively. This constraint means that the square of the magnitude of the gradient of u(x, y) and v(x,y) at each point must be minimised. In other words, the optical flow velocity changes smoothly between every neighbouring two pixels in an image. The optical flow vectors u(x,y) and v(x,y) are computed under these two constraints (3.3) and (3.5): 2 2 2 2 2 / / {{ux + uy ) + (vx + vy ) + IL{IXU + Iyv + It) } dxdy > min (3.6)
where fi is a weighting factor, which is set at 0.01 throughout our experiments. This minimisation can be accomplished by a variational approach with iterations: JUj_i oi"/' a
x,y
—
"~
_U ii"* a
x,y
X*"X it
^
1 •
y l°L
l" ^y^X
/
T
9. .
?
y "i "
T
9\
"t / 1
x
V.
where u^y is the fcth estimated horizontal optical flow vector at (x, y), and u^y is the fcth estimated average of u* y at the neighbouring pixels. v%y and v^y are the corresponding vertical components. An example of the optical flow analysis is shown in Figure 1. The left image (a) is extracted from a video sequence at a certain time, and the right image (b) is the next picture. Computed optical flow velocities are shown at every 8 pixels in (c). The horizontal axis indicates the x coordinate in an image, and the vertical axis indicates the y coordinate. Each line or dot in this figure illustrates the amount and direction of optical flow velocity at the corresponding position in the images (a) and (b).
3. 3.1
A Multimodal Speech Recognition System Feature Extraction and Fusion
Figure 2 shows the structure of our audiovisual multimodal speech recognition system. Speech signals are recorded at a 16kHz sampling rate, and a speech frame