A Chinese expert disambiguation method based on semi-supervised graph clustering
- PDF / 476,353 Bytes
- 8 Pages / 595.276 x 790.866 pts Page_size
- 60 Downloads / 182 Views
ORIGINAL ARTICLE
A Chinese expert disambiguation method based on semi-supervised graph clustering Jin Jiang • Xin Yan • Zhengtao Yu Jianyi Guo • Wei Tian
•
Received: 26 October 2013 / Accepted: 9 April 2014 Ó Springer-Verlag Berlin Heidelberg 2014
Abstract In order to utilize the associated relationship in the expert page efficiently, we’d like to introduce a Chinese expert disambiguation method based on the semi-supervised graph clustering with the integration of various associated relationships. Firstly, extract the correlation characteristics of the expert attributes according to the correlation analysis on the expert page. Secondly, construct a similarity matrix between the documents on different expert pages with the utilization of the attributes characteristics and the associated relationship of the expert pages. Finally, with the adoption of the attribute correlation as the semi-supervised constraint, construct an expert disambiguation model by applying the graph-based clustering approach to get the solution of the model through the kernel-based method for the purpose to achieve expert name disambiguation. Through the contrast experiment in the Chinese expert disambiguation, it turns out that the disambiguation effect is much better with the adoption of the semi-supervised graph clustering method that has been integrated with the expert-associated relationships. Keywords Chinese expert name disambiguation Semi-supervised graph clustering Associated relationship
J. Jiang X. Yan Z. Yu (&) J. Guo W. Tian School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China e-mail: [email protected] J. Jiang X. Yan Z. Yu J. Guo W. Tian Key Lab of Intelligent Information Processing, Kunming University of Science and Technology, Kunming 650500, China
1 Introduction Now name repetition has occurred everywhere on the internet, which has made the expert name disambiguation become imperative. At present, in terms of the research on expert disambiguation approaches, the main idea about disambiguation is still to transform the problem of expert name disambiguation into a clustering problem. At present the main disambiguation methods include the followings: the first kind is a clustering disambiguation method based on feature vectors similarity, for example, Wang [1] proposed to use the vector space model of web content to do expert evidence-pages clustering disambiguation to solve the multidocument conference resolution problem. The second is a clustering disambiguation method based on attribute similarity, for example, Cohen [2] achieved expert clustering disambiguation by calculating the similarity of attribute. The third is a clustering disambiguation method based on the specific relationships. For example, Lang [3] presented a name disambiguation approach based on social networks, building social networks by exploiting the evidence-page titles name co-occurrence relationships in the context fragment, and used clustering to realize disambiguatio
Data Loading...