Word Frequency Distributions
This book is an introduction to the statistical analysis of word frequency distributions, intended for linguists, psycholinguistics, and researchers work ing in the field of quantitative stylistics and anyone interested in quantitative aspects of lexical
- PDF / 26,151,065 Bytes
- 352 Pages / 439.37 x 666.14 pts Page_size
- 51 Downloads / 198 Views
Text, Speech and Language Technology VOLUME 18
Series Editors Nancy Ide , Vassar College, New York Jean Veronis, Universite de Provence and CNRS, France
Editorial Board Harald Baayen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth W. Church, AT & T Bell Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T. Barnard, University of Regina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim L1isterri, Universitat Autonoma de Barcelona, Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, LIMSI-CNRS, France
The titles published in this series are listed at the end of this volume.
Word Frequency Distributions By R. Harald Baayen University of Nijmegen The Netherlands
SPRINGER-SCIENCE+BUSINESS MEDIA, B.V.
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN 978-1-4020-0927-3
ISBN 978-94-010-0844-0 (eBook)
DOI 10.1007/978-94-010-0844-0
Printed on acid-free paper
Ali Rights Reserved © 2001 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 2001 Softcover reprint ofthe hardcover Ist edition 2001 No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission from the copyright owner.
to the memory of Rezo Chitashvili
Contents List of Figures List of Tables Introduction
ix xix xxi
1 Word Frequencies 1.1 Introduction. ... . . . 1.2 The frequenc y spectrum 1.3 Zipf . . . . . . . . .. .. 1.4 The quest for characteristic constants . 1.5 The lognormal distribution 1.6 Discussion . . . . . . . . . . 1.7 Bibliographical Comments . 1.8 Questions . .. . .
1 2 8 13 24 32 34 35 35
2
39 39 42 47 51 57 63 64 69 76 76 77
Non-parametric models 2.1 Basic concepts . . . . . . . . . . . 2.2 The Urn model 2.3 The Structural Type Distribution 2.4 The LNRE zone . . . . . . . . . . 2.5 Good-Turing estimates . . . . . . 2.6 Interpolation and Extrapolation . 2.6.1 Interpolation 2.6.2 Extrapolation . . .. 2.7 Discussion . . . . . . . . . . 2.8 Bibliographical Comments . 2.9 Questions
3 Parametric models 3.1 Introduction... ... . . . . . . .. ... . . . . . . 3.2 LNRE models . . . . . . . . . . . . . . . . . . . . . . 3.2.1 The Lognormal Structural Type Distribution 3.2.2 The Generalized Inverse Gauss-Poisson Structural Type Distribution . . . . . . . . . . . . . . . 3.2.3 The Zipfian Family of LNRE Models . 3.3 Evaluating Goodness of Fit 3.4 Parameter estimation . . . . . . . . . . . . . 3.5 A comparative study . . . . . . . . . . . . . 3.6 Comparing Lexical Measures Across Texts . 3.7 Discussion . . . . . . . . . . 3.8 Bibliographical Comments . . . . . . . . . .
79 79 82 82 89 93 118 122 124 132 132 133 vii
viii
CONTENTS 3.9 Questions
. .. . .
133
4 Mixture distributions 4.1 Introduction . 4.2 Expectations, variances, and covariances . 4.3 Examples of mixture distributions 4.3.