Natural Language Information Retrieval

The last decade has been one of dramatic progress in the field of Natural Language Processing (NLP). This hitherto largely academic discipline has found itself at the center of an information revolution ushered in by the Internet age, as demand for human-

PDF / 44,880,724 Bytes
407 Pages / 480.582 x 695.042 pts Page_size
16 Downloads / 419 Views

DOWNLOAD

REPORT

Text, Speech and Language Technology VOLUME7

Series Editors Nancy Ide, Vassar College, New York Jean Veronis, Universite de Provence and CNRS, France

Editorial Board Harald Baa yen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth W. Church, AT& T Bell Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T Barnard, University of Regina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim Llisterri, Universitat Autonoma de Barcelona , Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, LIMS/-CNRS, France

The titles published in this series are listed at the end of this volume.

Natural Language Information Retrieval Edited by

Tomek Strzalkowski General Electric, Research & Development

SPRINGER-SCIENCE+BUSINESS MEDIA, B.V.

A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN 978-94-017-2388-6 (eBook) ISBN 978-90-481-5209-4 DOI 10.1007/978-94-017-2388-6

Printed on acid-free paper

All Rights Reserved ©1999 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 1999 No part of the material protected by this 1 were given to the text categorization algorithms for training.

EXTRACTION-BASED TEXT CATEGORIZATION 1. 2. 3. 4.

exploded murder of assassination of was killed 5. was kidnapped 6. attack on 7. was injured 8. exploded in 9. death of 10. took_place 11. caused 12. claimed 13. was wounded Figure 5.

179

14. occurred 15. 16. 17. 18. 19. 20. 21.

22 . 23. 24.

25.

was located took_place on responsibility for occurred on was wounded in destroyed was murdered one of kidnapped exploded on died

The top 25 extraction patterns

AutoSlog-TS is the first system that can generate extraction patterns using only raw text as input. AutoSlog-TS needs both relevant and irrelevant sample texts to decide which patterns are most strongly associated with the domain. Not coincidentally, the preclassified corpus needed for AutoSlog-TS is exactly the same input that is required for the text categorization algorithms. We exploit the preclassified texts by processing them twice: once to generate extraction patterns and once to apply the extraction patterns to the texts. The extracted information, in the form of signatures and role fillers, is then analyzed statistically to identify classification terms that are highly correlated with a category.

4. Word-augmented relevancy signatures Augmenting relevancy signatures with semantic features produced much better results than relevancy signatures alone in the MUC-4 terrorism domain (Riloff and Lehnert, 1994) . But there was a price to pay. Augmented relevancy signatures need a semantic feature hierarchy and a dictionary of words tagged with semantic features . Consequently, using augmented relevancy signatures in a new domain requires an initial time investment that might not be acceptable for many applications. To eliminate the need for semantic features, we investigated whether the role fillers could be represented using lexical item

Natural Language Information Retrieval

Recommend Documents

Scholarly literature mining with Information Retrieval and Natural Language Processing

An Overview of Cross-Language Information Retrieval

Scholarly literature mining with information retrieval and natural language processing: Preface

Retrieval Methods of Natural Language Based on Automatic Indexing

Multidisciplinary Information Retrieval 6th Information Retrieval Fa

Multidisciplinary Information Retrieval 5th Information Retrieval Fa

Information Retrieval Technology Asia Information Retrieval Symposiu

Multidisciplinary Information Retrieval 7th Information Retrieval Fa

Multidisciplinary Information Retrieval Second Information Retrieval

Cross-language Cross-Language Mining and Retrieval C217 Informational Retrieval

Information Retrieval

Information Retrieval