Information Extraction Based on Event Driven from Template Web Pages
In order to acquire real-time information related to catastrophic events in emergency field, e.g., place, time, population casualties, etc. This paper puts forward a kind of template used for filtering Webpage noise. At the same time, studying on how to m
- PDF / 1,812,335 Bytes
- 9 Pages / 439.37 x 666.142 pts Page_size
- 55 Downloads / 226 Views
Information Extraction Based on Event Driven from Template Web Pages Xiuhong Zhang and Zhe Gong
Abstract In order to acquire real-time information related to catastrophic events in emergency field, e.g., place, time, population casualties, etc. This paper puts forward a kind of template used for filtering Webpage noise. At the same time, studying on how to make the Bootstrapping algorithm applied to the emergency field and extract information from Web pages based on event driven to expand the scope of information extraction while ensure the information of real-time. Experiment results demonstrate that compared with the traditional Web information extraction ways, this method which achieve high accuracy and efficiency in Web information extraction, to a great extent, can meet the requirement of realtime and be successfully applied in the emergency field.
Keywords Emergency field Information extraction template Bootstrapping algorithm
Event driven
Web
55.1 Introduction In recent years, as the Internet scale surges and the network information transmission advantage appears, Web information has become the attention of governments at all levels and departments. Especially in the emergency field, there is ‘‘information delay’’ and ‘‘information failure’’ problem, extracting real-time X. Zhang (&) School of Computer, BeiHang University, XueYuan Road No.37, HaiDian District, Beijing, China e-mail: [email protected] Z. Gong Beijing Institute of Spacecraft Environment Engineering, China Academy of Space Technology, XueYuan Road No.37, HaiDian District, Beijing, China
W. Lu et al. (eds.), Proceedings of the 2012 International Conference on Information Technology and Software Engineering, Lecture Notes in Electrical Engineering 211, DOI: 10.1007/978-3-642-34522-7_55, Springer-Verlag Berlin Heidelberg 2013
515
516
X. Zhang and Z. Gong
catastrophic information related to the disaster event from Web pages, for example, places, time, population casualties, etc., so as to provide basis for decision making in the emergency field. In addition, professionals can obtain the information of public opinion, have a comprehensive grasp of public opinion dynamic and make the right public opinion guides. Web information extraction is different from common text extraction, this is because that information on the Internet mostly comes out in the form of Web page, and the structure of Web page itself is constantly changing, namely, Web information extraction is a kind of text information extraction with variable structure. The existing methods of Web information extraction mainly include the method based on statistical theory, visual feature, DOM tree structure and Web template, the last two are widely used. Although the method based on DOM tree structure has higher degree of automation than the one based on Web template, but the applicable scope and complexity of the algorithm based on Web template is superior to the former, and this method avoids the repetitive computation on similar Web pages, summarizes unified extr
Data Loading...