Information Extraction Based on Event Driven from Template Web Pages

In order to acquire real-time information related to catastrophic events in emergency field, e.g., place, time, population casualties, etc. This paper puts forward a kind of template used for filtering Webpage noise. At the same time, studying on how to m

PDF / 1,812,335 Bytes
9 Pages / 439.37 x 666.142 pts Page_size
55 Downloads / 236 Views

DOWNLOAD

REPORT

Information Extraction Based on Event Driven from Template Web Pages Xiuhong Zhang and Zhe Gong

Abstract In order to acquire real-time information related to catastrophic events in emergency field, e.g., place, time, population casualties, etc. This paper puts forward a kind of template used for filtering Webpage noise. At the same time, studying on how to make the Bootstrapping algorithm applied to the emergency field and extract information from Web pages based on event driven to expand the scope of information extraction while ensure the information of real-time. Experiment results demonstrate that compared with the traditional Web information extraction ways, this method which achieve high accuracy and efficiency in Web information extraction, to a great extent, can meet the requirement of realtime and be successfully applied in the emergency field.

Keywords Emergency field Information extraction template Bootstrapping algorithm

Event driven

Web

55.1 Introduction In recent years, as the Internet scale surges and the network information transmission advantage appears, Web information has become the attention of governments at all levels and departments. Especially in the emergency field, there is ‘‘information delay’’ and ‘‘information failure’’ problem, extracting real-time X. Zhang (&) School of Computer, BeiHang University, XueYuan Road No.37, HaiDian District, Beijing, China e-mail: [email protected] Z. Gong Beijing Institute of Spacecraft Environment Engineering, China Academy of Space Technology, XueYuan Road No.37, HaiDian District, Beijing, China

W. Lu et al. (eds.), Proceedings of the 2012 International Conference on Information Technology and Software Engineering, Lecture Notes in Electrical Engineering 211, DOI: 10.1007/978-3-642-34522-7_55, Springer-Verlag Berlin Heidelberg 2013

515

516

X. Zhang and Z. Gong

catastrophic information related to the disaster event from Web pages, for example, places, time, population casualties, etc., so as to provide basis for decision making in the emergency field. In addition, professionals can obtain the information of public opinion, have a comprehensive grasp of public opinion dynamic and make the right public opinion guides. Web information extraction is different from common text extraction, this is because that information on the Internet mostly comes out in the form of Web page, and the structure of Web page itself is constantly changing, namely, Web information extraction is a kind of text information extraction with variable structure. The existing methods of Web information extraction mainly include the method based on statistical theory, visual feature, DOM tree structure and Web template, the last two are widely used. Although the method based on DOM tree structure has higher degree of automation than the one based on Web template, but the applicable scope and complexity of the algorithm based on Web template is superior to the former, and this method avoids the repetitive computation on similar Web pages, summarizes unified extr

Data Loading...

Information Extraction Based on Event Driven from Template Web Pages

Recommend Documents

Site-Level Web Template Extraction Based on DOM Analysis

Finding and Extracting Academic Information from Conference Web Pages

Visual Web Information Extraction

Web Information Extraction System

Web Information Extraction

Dynamic Web Pages

Event Extraction

Foundation Dynamic Web Pages with Python Create Dynamic Web Pages wi

Recommending Web Pages Using Item-Based Collaborative Filtering Approaches

Pattern-Based Extraction of Addresses from Web Page Content

Patent Information Extraction from XMLs

Semantic Web Service Automatic Composition Based on Discrete Event Calculus