Information Extraction Towards Scalable, Adaptable Systems

Information extraction (IE) is a new technology enabling relevant content to be extracted from textual information available electronically. IE essentially builds on natural language processing and computational linguistics, but it is also closely related

  • PDF / 3,240,967 Bytes
  • 175 Pages / 413.059 x 657.461 pts Page_size
  • 28 Downloads / 237 Views

DOWNLOAD

REPORT


Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen

1714

3

Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Singapore Tokyo

Maria Teresa Pazienza (Ed.)

Information Extraction Towards Scalable, Adaptable Systems

13

Series Editors Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA J¨org Siekmann, University of Saarland, Saarbr¨ucken, Germany

Volume Editor Maria Teresa Pazienza Department of Computer Science, Systems and Production University of Roma, Tor Vergata Via di Tor Vergata, I-00133 Roma, Italy E-mail: [email protected]

Cataloging-in-Publication data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Information extraction : towards scalable, adaptable systems / Maria Teresa Pazienza (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ; Singapore ; Tokyo : Springer, 1999 (Lecture notes in computer science ; 1714 : Lecture notes in artificial intelligence) ISBN 3-540-66625-7

CR Subject Classification (1998): I.2, H.3 ISBN 3-540-66625-7 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. © Springer-Verlag Berlin Heidelberg 1999 Printed in Germany Typesetting: Camera-ready by author SPIN: 10705092 06/3142 – 5 4 3 2 1 0

Printed on acid-free paper

Preface The ever-growing interest in new approaches to information management is strictly related to the explosion of collections of documents made accessible through communication networks. The enormous amount of daily available information imposes the development of IE Information Extraction technologies that enable one to: access relevant documents only and integrate the extracted information into the user's environment. In fact, the classic application scenario for IE foresees, for example: 1. a company interested in getting detailed synthetic information related to prede ned categories 2. the documents, as sources of information, located in electronically accessible sites agencies' news, web pages, companies' textual documentation, international regulations etc. 3. the extracted information eventually being inserted in private data bases for further processing e.g. data mining, summary and report generation, forms

lling,.... A key problem for a wider deployment of IE systems is in their exibility and easy adaptation to new application frameworks. Most of the commonly available IE systems are based on speci c domain-dependent methodologies for know