Extracting Knowledge Using Wikipedia Semi-structured Resources

Automatic knowledge discovery has been an active research field for years. Knowledge can be extracted from source files with different data structures and using different types of resources. In this paper, we propose a pattern-based approach of extraction

PDF / 569,800 Bytes
9 Pages / 439.37 x 666.142 pts Page_size
42 Downloads / 304 Views

DOWNLOAD

REPORT

bstract. Automatic knowledge discovery has been an active research ﬁeld for years. Knowledge can be extracted from source ﬁles with different data structures and using diﬀerent types of resources. In this paper, we propose a pattern-based approach of extraction, which exploits Wikipedia semi-structured data in order to extract the implicit knowledge behind any unstructured text. The proposed approach ﬁrst identiﬁes concepts of the studied text and then extracts their corresponding common sense and basic knowledge. We explored the eﬀectiveness of our knowledge extraction model on city domain textual sources. The initial evaluation of the approach shows its good performance.

Keywords: Wikipedia semi-structured resources ery · Common sense knowledge

1

·

Knowledge discov-

Introduction

Knowledge discovery is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data [2]. Knowledge can be obtained from sources with diﬀerent types of data structures: unstructured, structured and semi-structured. While structured and semi-structured sources have predeﬁned data models, unstructured data has no organization to facilitate the task of extraction. Unstructured data ﬁles often contain a considerable amount of knowledge, which can be used in diﬀerent applications of Artiﬁcial Intelligence. In knowledge discovery, resources with diﬀerent types of structures can be exploited [4]. The same as source data, resources are also in three types. Machine readable structured resources, such as thesauri, are easy to exploit but diﬃcult to create and maintain. Due to these diﬃculties, they may not cover all domains and languages. Unstructured resources, on the other side, are collections of machine-unreadable multimedia content and extracting reliable knowledge from such resources is a very challenging task. Hence, structured resources extract knowledge with high accuracy but low coverage rate, while unstructured resources cover all the domains but the knowledge extracted from such resources is less reliable. In order to make use of the positive points of each type and reduce the limitations, in this paper we focus on semi-structured resources. Wikipedia is one c Springer International Publishing Switzerland 2016 E. M´ etais et al. (Eds.): NLDB 2016, LNCS 9612, pp. 249–257, 2016. DOI: 10.1007/978-3-319-41754-7 22

250

N. Firoozeh

of the major resources, which is updated regularly and contains many statements in natural language. In this work, we exploit category names and infobox tables of Wikipedia as semi-structured resources in order to extract the implicit knowledge behind any given unstructured text. Two kinds of knowledge are targeted in our work: basic and common sense (CS). By basic knowledge, we mean any kind of knowledge that provides basic information about the studied concept. Considering Paris as an example of a concept, information about its population, mayor, etc., can be considered as basic knowledge. Common sense knowledge is however deﬁned as the background knowledge that

Data Loading...

Extracting Knowledge Using Wikipedia Semi-structured Resources

Recommend Documents

Wikipedia Knowledge Graph for Explainable AI

Extracting Topics from Open Educational Resources

Eligibility of English Hypernymy Resources for Extracting Knowledge from Natural-Language Texts

Mining the Personal Interests of Microbloggers via Exploiting Wikipedia Knowledge

Knowledge-Driven Wikipedia Article Recommendation for Electronic Textbooks

Extracting Knowledge From Time Series An Introduction to Nonlinear E

Web Architecture and Naming for Knowledge Resources

A Study of the Wikipedia Knowledge Recommendation Service for Satisfaction of ePortfolio Users

Development of Knowledge Management Systems and Human Resources Using Lean Manufacturing Concept

Extracting Opinion Targets Using Attention-Based Neural Model

Knowledge Fragment Enrichment Using Domain Knowledge Base

Extraction and Portrait of Knowledge Points for Open Learning Resources