Syntactic Wordclass Tagging
In both the linguistic and the language engineering community, the creation and use of annotated text collections (or annotated corpora) is currently a hot topic. Annotated texts are of interest for research as well as for the development of natural langu
- PDF / 31,490,411 Bytes
- 341 Pages / 439.37 x 666.142 pts Page_size
- 79 Downloads / 167 Views
Text, Speech and Language Technology VOLUME 9
Series Editors
Nancy Ide, Vassar College, New York Jean Veronis, Universite de Provence and CNRS, France Editorial Board
Harald Baayen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth W. Church, AT & T Bell Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T. Barnard, University of Regina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim Llisterri, Universitat Autonoma de Barcelona, Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, LIMSI-CNRS, France
Syntactic Wordclass Tagging edited by
Hans van Halteren University of Nijmegen
SPRINGER-SCIENCE+BUSINESS MEDIA. B.Y.
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN 978-90-481-5296-4 ISBN 978-94-015-9273-4 (eBook) DOI 10.1007/978-94-015-9273-4
Printed on acid-free paper
AlI Rights Reserved © 1999 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 1999 Softcover reprint of the hardcover 1st edition 1999 No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permis sion from the copyright owner.
Contents
Preface
xiii
Contributing Authors
xv
Part I The User's View 1 Orientation Atro Voutilainen 1.1 Morphosyntactic tags 1.2 Automatic tagging 2 A Short History of Tagging Atro Voutilainen 2.1 Approaches to wordclass tagging 2.2 Pioneering work 2.3 The breakthrough of data-driven methods 2.3.1 N-gram taggers 2.3.2 Data-driven local rules 2.4 Recent work in the data-driven approach 2.4.1 Hidden Markov Models 2.4.2 Recent work on data-driven local rules 2.4.3 Neural taggers
3 3 6 9 9
10 11 12 13
14 14 16 16
v
vi
CONJENTS
2.5
2.6
2.4.4 Case-based taggers 2.4.5 Combined data-driven taggers Recent work in the linguistic approach 2.5.1 English Constraint Grammar 2.5.2 A rule-based tagger of Turkish 2.5.3 A finite-state tagger of French 2.5.4 A syntax-based tagger of English The current situation
3 The Use of Tagging Geoffrey Leech and Nicholas Smith 3.1 Introduction 3.2 Tagging in corpus linguistics 3.2.1 Adding further annotations 3.2.2 Information extraction 3.3 Practical applications 3.3.1 Uses of tagging software 3.3.2 Uses of tagged text 4 Tagsets Jan Cloeren 4.1 Introduction 4.2 Information contents of the tags in the tagset 4.2.1 Morphosyntactic tags 4.2.2 Syntactic tags 4.2.3 Semantic and discourse tags 4.2.4 Distributional similarity tags 4.3 Special problems in the application of tagsets 4.3.1 Multi-unit tokens and multi-token units 4.3.2 Underspecification and ambiguity 4.4 Notation 4.4.1 Class and feature value names 4.4.2 Structure of tags 4.4.3 Positioning of tags 4.4.4 SGMUTEI guidelines for tags 5 Standards for Tagsets Geoffrey Leech and Andrew Wilson 5.1 Introduction 5.2 Recommendations for morphosyntactic (wordclass) categories 5.2.1 Reason