Corpus-Based Methods in Language and Speech Processing

Corpus-based methods will be found at the heart of many language and speech processing systems. This book provides an in-depth introduction to these technologies through chapters describing basic statistical modeling techniques for language and speech, th

  • PDF / 24,882,381 Bytes
  • 247 Pages / 439.37 x 666.142 pts Page_size
  • 46 Downloads / 202 Views

DOWNLOAD

REPORT


Text, Speech and Language Technology VOLUME 2

Series Editors: Nancy Ide, Vassar College, New York Jean Veronis, CNRS, France Editorial Board: HaraJd Baayen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth w. Church, AT&T BeU Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T. Bamard, University ofRegina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim Llisterri, Universitat Autonoma de Barcelona, Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, LIMSI-CNRS, France

The titles published in this series are Iisted at the end of this volume.

Corpus-Based Methods in Language and Speech Processing Edited by

Steve Young Cambridge University, Engineering Department, Cambridge, U.K.

and

Gerrit Bloothooft Research Institute for Language and Speech, Utrecht University, Utrecht, The Netherlands

elsnet

••••••••

EUROPFAN NE1WORK IN l.ANGUAGE ANO SPEECH

SPRINGER-SCIENCE+BUSINESS MEDIA, B.V.

A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN 978-90-481-4813-4 ISBN 978-94-017-1183-8 (eBook) DOI 10.1007/978-94-017-1183-8

Printed on acid-free paper

All Rights Reserved

© 1997 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 1997 No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, inc1uding photocopying, recording or by any information storage and retrieval system, without written permission from the copyright owner.

Contents

Introduction

ix

1 Corpus-Based Statistical Methods in Speech and Language Processing H. Ney 1 1 Introduction..... . . . 2 2 Automatic Systems for Speech and Language . . . 3 3 What is Statistics? . . . . 3 3.1 General Remarks . . . 4 3.2 Application to Speech Processing 6 3.3 Implementation of the Statistical Approach 8 3.4 Advantages of the Probabilistic Framework 8 3.5 The Misconception about Statistics 9 4 Selected Topics . . . . . .. 4.1 Bayes Decision Rule and Neural Nets 9 14 4.2 Complex Models and the EM Aigorithm . 19 CART: Classification and Regression Trees 4.3 22 4.4 Text Translation and Mixture Models 25 Frontiers of Statistics 5 26 Interpretation . . . .. .. 6 2 Hidden Markov Models in Speech and Language Processing K. Knill & S. Young 1 Overview of HMMs in Speech and Language ProcessJng 1.1 System Overview. . . . . . . . 1.2 Hidden Markov Model . . . . . . 1.3 Pattern Matching with HMMs .. 1.4 Estimation of HMM Parameters . 1.5 HMM-based Recognition . Applications..... 1.6 1.7 Structure of Chapter . 2 Parameterizing Speech . . 2.1 General Principles . .

27 27 29 32 33 34 35 35 36 36

Contents

VI

3

4

5

3

2.2 FFT-based Analysis . . . 2.3 Dynamic Coefficients 2.4 Energy and Pre-emphasis 2.5 LPC-based Analysis . . Training HMMs . . . . . . . . 3.1 Single Gaussian HMMs 3.2 Viterbi Training . . . . 3.3 Baum-Welch Re-estimation 3.4 Forward-Backward Aigorithm 3.5 Transition Probabilities . . . 3.6 Mixture Gaussian Output Distr