GeoSR: Geographically Explore Semantic Relations in World Knowledge

Methods to determine the semantic relatedness (SR) value between two lexically expressed entities abound in the field of natural language processing (NLP). The goal of such efforts is to identify a single measure that summarizes the number and strength of

  • PDF / 363,464 Bytes
  • 19 Pages / 439.37 x 666.142 pts Page_size
  • 7 Downloads / 183 Views

DOWNLOAD

REPORT


Brent Hecht, Martin Raubal Department of Geography University of California, Santa Barbara 1832 Ellison Hall, Santa Barbara, CA 93106-4060 {bhecht, raubal}@geog.ucsb.edu

Abstract. Methods to determine the semantic relatedness (SR) value between two lexically expressed entities abound in the field of natural language processing (NLP). The goal of such efforts is to identify a single measure that summarizes the number and strength of the relationships between the two entities. In this paper, we present GeoSR, the first adaptation of SR methods to the context of geographic data exploration. By combining the first use of a knowledge repository structure that is replete with non-classical relations, a new means of explaining those relations to users, and the novel application of SR measures to a geographic reference system, GeoSR allows users to geographically navigate and investigate the world knowledge encoded in Wikipedia. There are numerous visualization and interaction paradigms possible with GeoSR; we present one implementation as a proofof-concept and discuss others. Although, Wikipedia is used as the knowledge repository for our implementation, GeoSR will also work with any knowledge repository having a similar set of properties. Keywords: semantic relatedness, natural language processing, geographic reference system, GeoSR, Wikipedia

1

Introduction and Related Work

In today’s information-overloaded world, researchers in both the academic and professional community, students, policy analysts and people in many other fields frequently find themselves in the position of trying to locate a useful needle of information in a haystack of data. This search is often

96

Brent Hecht, Martin Raubal

aided by the use of a spatial lens, as up to 80 percent of human decisions affect space or are affected by spatial situations (Albaredes 1992). For example, a student doing a project on Judaism, love, George W. Bush, Berlin or any other concept or named entity will definitely want to know the places that are most related to these concepts and named entities and why. GeoSR provides users with a novel method of easily accomplishing this task. 1.1 GeoSR and Wikipedia GeoSR uses Wikipedia as its knowledge repository. The introduction of every paper produced by the burgeoning Wikipedia research community has its own way of describing the phenomenon that is Wikipedia. However, they all seem to agree on several vital properties. First, Wikipedia is a free encyclopedia that is produced via a collaborative effort by its contributors. Second, Wikipedia is highly multilingual, with hundreds of available languages. Third, Wikipedia is enormous and is, by far, the largest encyclopedia the world has ever seen. Indeed, as of October 2007, Wikipedias in 14 languages had over 100,000 articles and the largest Wikipedia, English, had over 2.05 million. Finally, many researchers argue that Wikipedia “has probably become the largest collection of freely available knowledge” (Zesch et al. 2007a, p. 1). The above facts are all relatively well