Segmenting Route Descriptions for Mobile Devices

The provision of information on mobile devices introduces interesting challenges. The most obvious of these is that ways have to be found of optimising the use of the limited available space; however, we also have to take account of the fact that, unlike

  • PDF / 34,049,173 Bytes
  • 423 Pages / 432.812 x 662.821 pts Page_size
  • 36 Downloads / 238 Views

DOWNLOAD

REPORT


Text, Speech and Language Technology VOLUME 28

Series Editors Nancy Ide, Vassar College, New York Jean Véronis, Université de Provence and CNRS, France Editorial Board Harald Baayen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth W. Church, AT & T Bell Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T. Barnard, University ofRegina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim Llisterri, Universitat Autonma de Barcelona, Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, LIMSI-CNRS, France

The titles published in this series are listed at the end of this volume.

Spoken Multimodal Human-Computer Dialogue in Mobile Environments Edited by W. Minker University of Ulm, Germany

Dirk ü h l e r University of Ulm, Germany and

LailaDybkjræ University of Southern Denmark, Odense, Denmark

Iy

_

dl = d ; >

_

dl

I t =

d ; >

dx u

= t t >

dy v

=

Tt

,„ ,. (3 4)

-

u(x, y) and v(x, y) correspond to horizontal and vertical elements of the optical flow at a point (x, y). Since we cannot determine u(x, y) and v(x, y) using only the Equation (3.3), we incorporate another restraint called the "smoothness constraint": / /

2 2 u ) + (vx + vy )} dxdy - • min

(3.5)

40

SPOKEN MULTIMODAL HUMAN-COMPUTER DIALOGUE

where uX9uy, vx and vy denote partial derivatives of u and v with respect to x and y, respectively. This constraint means that the square of the magnitude of the gradient of u(x, y) and v(x,y) at each point must be minimised. In other words, the optical flow velocity changes smoothly between every neighbouring two pixels in an image. The optical flow vectors u(x,y) and v(x,y) are computed under these two constraints (3.3) and (3.5): 2 2 2 2 2 / / {{ux + uy ) + (vx + vy ) + IL{IXU + Iyv + It) } dxdy -> min (3.6)

where fi is a weighting factor, which is set at 0.01 throughout our experiments. This minimisation can be accomplished by a variational approach with iterations: JUj_i oi"/' a

x,y



"~

_U ii"* a

x,y

X*-"X it

^

1 •

y l°L

l" ^-y^X

/

T

9. .

?

y "i "

T

9\

-"-t / 1

x

V.

where u^y is the fc-th estimated horizontal optical flow vector at (x, y), and u^y is the fc-th estimated average of u* y at the neighbouring pixels. v%y and v^y are the corresponding vertical components. An example of the optical flow analysis is shown in Figure 1. The left image (a) is extracted from a video sequence at a certain time, and the right image (b) is the next picture. Computed optical flow velocities are shown at every 8 pixels in (c). The horizontal axis indicates the x coordinate in an image, and the vertical axis indicates the y coordinate. Each line or dot in this figure illustrates the amount and direction of optical flow velocity at the corresponding position in the images (a) and (b).

3. 3.1

A Multimodal Speech Recognition System Feature Extraction and Fusion

Figure 2 shows the structure of our audio-visual multimodal speech recognition system. Speech signals are recorded at a 16kHz sampling rate, and a speech frame