Extracting Knowledge Using Wikipedia Semi-structured Resources
Automatic knowledge discovery has been an active research field for years. Knowledge can be extracted from source files with different data structures and using different types of resources. In this paper, we propose a pattern-based approach of extraction
- PDF / 569,800 Bytes
- 9 Pages / 439.37 x 666.142 pts Page_size
- 42 Downloads / 195 Views
bstract. Automatic knowledge discovery has been an active research field for years. Knowledge can be extracted from source files with different data structures and using different types of resources. In this paper, we propose a pattern-based approach of extraction, which exploits Wikipedia semi-structured data in order to extract the implicit knowledge behind any unstructured text. The proposed approach first identifies concepts of the studied text and then extracts their corresponding common sense and basic knowledge. We explored the effectiveness of our knowledge extraction model on city domain textual sources. The initial evaluation of the approach shows its good performance.
Keywords: Wikipedia semi-structured resources ery · Common sense knowledge
1
·
Knowledge discov-
Introduction
Knowledge discovery is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data [2]. Knowledge can be obtained from sources with different types of data structures: unstructured, structured and semi-structured. While structured and semi-structured sources have predefined data models, unstructured data has no organization to facilitate the task of extraction. Unstructured data files often contain a considerable amount of knowledge, which can be used in different applications of Artificial Intelligence. In knowledge discovery, resources with different types of structures can be exploited [4]. The same as source data, resources are also in three types. Machine readable structured resources, such as thesauri, are easy to exploit but difficult to create and maintain. Due to these difficulties, they may not cover all domains and languages. Unstructured resources, on the other side, are collections of machine-unreadable multimedia content and extracting reliable knowledge from such resources is a very challenging task. Hence, structured resources extract knowledge with high accuracy but low coverage rate, while unstructured resources cover all the domains but the knowledge extracted from such resources is less reliable. In order to make use of the positive points of each type and reduce the limitations, in this paper we focus on semi-structured resources. Wikipedia is one c Springer International Publishing Switzerland 2016 E. M´ etais et al. (Eds.): NLDB 2016, LNCS 9612, pp. 249–257, 2016. DOI: 10.1007/978-3-319-41754-7 22
250
N. Firoozeh
of the major resources, which is updated regularly and contains many statements in natural language. In this work, we exploit category names and infobox tables of Wikipedia as semi-structured resources in order to extract the implicit knowledge behind any given unstructured text. Two kinds of knowledge are targeted in our work: basic and common sense (CS). By basic knowledge, we mean any kind of knowledge that provides basic information about the studied concept. Considering Paris as an example of a concept, information about its population, mayor, etc., can be considered as basic knowledge. Common sense knowledge is however defined as the background knowledge that
Data Loading...