Syntactic Wordclass Tagging

In both the linguistic and the language engineering community, the creation and use of annotated text collections (or annotated corpora) is currently a hot topic. Annotated texts are of interest for research as well as for the development of natural langu

  • PDF / 31,490,411 Bytes
  • 341 Pages / 439.37 x 666.142 pts Page_size
  • 79 Downloads / 166 Views

DOWNLOAD

REPORT


Text, Speech and Language Technology VOLUME 9

Series Editors

Nancy Ide, Vassar College, New York Jean Veronis, Universite de Provence and CNRS, France Editorial Board

Harald Baayen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth W. Church, AT & T Bell Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T. Barnard, University of Regina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim Llisterri, Universitat Autonoma de Barcelona, Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, LIMSI-CNRS, France

Syntactic Wordclass Tagging edited by

Hans van Halteren University of Nijmegen

SPRINGER-SCIENCE+BUSINESS MEDIA. B.Y.

A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN 978-90-481-5296-4 ISBN 978-94-015-9273-4 (eBook) DOI 10.1007/978-94-015-9273-4

Printed on acid-free paper

AlI Rights Reserved © 1999 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 1999 Softcover reprint of the hardcover 1st edition 1999 No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permis sion from the copyright owner.

Contents

Preface

xiii

Contributing Authors

xv

Part I The User's View 1 Orientation Atro Voutilainen 1.1 Morphosyntactic tags 1.2 Automatic tagging 2 A Short History of Tagging Atro Voutilainen 2.1 Approaches to wordclass tagging 2.2 Pioneering work 2.3 The breakthrough of data-driven methods 2.3.1 N-gram taggers 2.3.2 Data-driven local rules 2.4 Recent work in the data-driven approach 2.4.1 Hidden Markov Models 2.4.2 Recent work on data-driven local rules 2.4.3 Neural taggers

3 3 6 9 9

10 11 12 13

14 14 16 16

v

vi

CONJENTS

2.5

2.6

2.4.4 Case-based taggers 2.4.5 Combined data-driven taggers Recent work in the linguistic approach 2.5.1 English Constraint Grammar 2.5.2 A rule-based tagger of Turkish 2.5.3 A finite-state tagger of French 2.5.4 A syntax-based tagger of English The current situation

3 The Use of Tagging Geoffrey Leech and Nicholas Smith 3.1 Introduction 3.2 Tagging in corpus linguistics 3.2.1 Adding further annotations 3.2.2 Information extraction 3.3 Practical applications 3.3.1 Uses of tagging software 3.3.2 Uses of tagged text 4 Tagsets Jan Cloeren 4.1 Introduction 4.2 Information contents of the tags in the tagset 4.2.1 Morphosyntactic tags 4.2.2 Syntactic tags 4.2.3 Semantic and discourse tags 4.2.4 Distributional similarity tags 4.3 Special problems in the application of tagsets 4.3.1 Multi-unit tokens and multi-token units 4.3.2 Underspecification and ambiguity 4.4 Notation 4.4.1 Class and feature value names 4.4.2 Structure of tags 4.4.3 Positioning of tags 4.4.4 SGMUTEI guidelines for tags 5 Standards for Tagsets Geoffrey Leech and Andrew Wilson 5.1 Introduction 5.2 Recommendations for morphosyntactic (wordclass) categories 5.2.1 Reason