Discovery of topic flows of authors

  • PDF / 1,804,968 Bytes
  • 25 Pages / 439.37 x 666.142 pts Page_size
  • 93 Downloads / 187 Views

DOWNLOAD

REPORT


Discovery of topic flows of authors Young-Seob Jeong1 · Sang-Hun Lee2 · Gahgene Gweon3 · Ho-Jin Choi4

© The Author(s) 2017. This article is an open access publication

Abstract With an increase in the number of Web documents, the number of proposed methods for knowledge discovery on Web documents have been increased as well. The documents do not always provide keywords or categories, so unsupervised approaches are desirable, and topic modeling is such an approach for knowledge discovery without using labels. Further, Web documents usually have time information such as publish years, so knowledge patterns over time can be captured by incorporating the time information. The temporal patterns of knowledge can be used to develop useful services such as a graph of research trends, finding similar authors (potential co-authors) to a particular author, or finding top researchers about a specific research domain. In this paper, we propose a new topic model, Author Topic-Flow (ATF) model, whose objective is to capture temporal patterns of research interests of authors over time, where each topic is associated with a research domain. The state-of-the-art model, namely Temporal Author Topic model, has the same objective as ours, where it com-

B

Ho-Jin Choi [email protected] Young-Seob Jeong [email protected] Sang-Hun Lee [email protected] Gahgene Gweon [email protected]

1

Soonchunhyang University, Asan-si, South Korea

2

Agency for Defense Development, Daejeon, South Korea

3

Seoul National University, Seoul, South Korea

4

Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon, South Korea

123

Y.-S. Jeong et al.

putes the temporal patterns of authors by combining the patterns of topics. We believe that such ‘indirect’ temporal patterns will be poor than the ‘direct’ temporal patterns of our proposed model. The ATF model allows each author to have a separated variable which models the temporal patterns, so we denote it as ‘direct’ topic flow. The design of the ATF model is based on the hypothesis that ‘direct’ topic flows will be better than the ‘indirect’ topic flows. We prove the hypothesis is true by a structural comparison between the two models and show the effectiveness of the ATF model by empirical results. Keywords Probabilistic topic model · Topic flow · Knowledge discovery

1 Introduction As the number of Web documents is increasing exponentially, it becomes important to develop methods to extract useful information or knowledge from the documents. There are many knowledge discovery problems, one of which is the discovery of academic research interests. The discovery of research interests may give an insight into research trends according to a particular period and further may help researchers to make wise decisions for their future research topics. It is important to note the difference between the discovery of academic interests and identifying experts [7]. The task of discovery of academic interests is to find people who write about particular topics, and the identifyi