Treebanks Building and Using Parsed Corpora
Linguists and engineers in Natural Language Processing tend to use electronic corpora more and more. Most research has long been limited to raw (unannotated) texts or to tagged texts (annotated with parts of speech only), but these approaches suffer from
- PDF / 42,454,532 Bytes
- 411 Pages / 453.48 x 680.28 pts Page_size
- 32 Downloads / 217 Views
		    Text, Speech and Language Technology V O L U M E 20
 
 Series Editors Nancy Ide, Vassar College, New York Jean Veronis, Universite de Provence and CNRS, France Editorial Board Harald Baayen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth W. Church, AT & T Bell Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T. Barnard, University of Regina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim Llisterri, Universität Autonoma de Barcelona, Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, LIMSI-CNRS, France
 
 The titles published in this series are listed at the end of this volume.
 
 Treebanks Building and Using Parsed Corpora Edited by Anne Abeille Universite Paris 7, Paris, France
 
 Springer Science+Business Media, LLC
 
 A C.I.P. Catalogue record for this book is available from the Library of Congress.
 
 ISBN 978-1-4020-1335-5 ISBN 978-94-010-0201-1 (eBook) DOI 10.1007/978-94-010-0201-1
 
 Printed on acid-free paper
 
 All Rights Reserved © 2003 Springer Science+Business Media New York Originally published by Kluwer Academic Publishers 2003 Softcover reprint of the hardcover 1st edition 2003 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
 
 Contents
 
 Preface
 
 XI
 
 Introduction Anne Ab eilie I Bu ild ing Treebanks 2 Using treebanks Part I
 
 xiii Xv
 
 xix
 
 Building treebanks
 
 E NGLISH TREEBANKS Chapter I TH E P ENN TR EEBANK: AN OVERVIEW Ann Taylor, Mitchell Marcus, Beatrice Santorini I The annotation schemes 2 Methodology 3 Conclusion s
 
 5
 
 6 16 20
 
 Chapter 2 THOUGHTS ON TWO DECADES OF DRAWING TREES Geoffrey Sampson I Historical background 2 Building treeb ank s 3 Exploiting the S USANNE Treebank 4 Small is beautiful 5 Annotating a spoke n corpus 6 Using the CHRISTl NE Corpus 7 Conclusion
 
 23 23
 
 26 29 33 35
 
 38
 
 40
 
 Chapter 3
 
 43
 
 BA NK OF ENGLISH AND BEYO ND Timo Jiirvinen I Introduction 2 Annotating 200 million words 3 ENGCG Syntax 4 FDG parser 5 Conclusion
 
 43 44
 
 52 54 56
 
 v
 
 VI
 
 TREEBANKS
 
 Chapter 4 COMPLETING PARSED CORPORA FROM CORRECTION TO EYOLUTION
 
 Sean Wallis I Introduction 2 Conventional post-correction 3 A paradigm shift: transverse correction 4 Critique
 
 61 61 63 65 68
 
 GERMAN TREEBANKS
 
 Chapter 5 SYNTACTIC ANNOTATION OF A GERMAN NEWSPAPER CORPUS
 
 73
 
 Thorsten Brants, Wojeieeh Skut, Hans Uszkoreit I Introduction 2 Treebank development 3 Corpus annotation 4 Applications 5 Conclusions Appendix: Tagsets
 
 73 74 77 83 83 87
 
 Chapter 6 ANNOTATION OF ERROR TYPES FOR A GERMAN NEWSGROUP CORPUS
 
 Markus Beeker, Andrew Bredenkamp, Berthold Crysmann, Juditn Klein I Introduction 2 Corpus Description 3 Annotation Strategy 4 Annotation Tools 5 Evaluation 6 First Results 7 Conclusion
 
 89 89 90 9		
 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	