Understanding the limitations of network online learning
- PDF / 5,769,784 Bytes
- 25 Pages / 595 x 794 pts Page_size
- 12 Downloads / 216 Views
Applied Network Science
RESEARCH
Open Access
Understanding the limitations of network online learning Timothy LaRock1*
, Timothy Sakharov1 , Sahely Bhadra3 and Tina Eliassi-Rad1,2
*Correspondence: [email protected] 1 Network Science Institute, Northeastern University, 360 Huntington Ave, Boston MA, 02115-5000 USA Full list of author information is available at the end of the article
Abstract Studies of networked phenomena, such as interactions in online social media, often rely on incomplete data, either because these phenomena are partially observed, or because the data is too large or expensive to acquire all at once. Analysis of incomplete data leads to skewed or misleading results. In this paper, we investigate limitations of learning to complete partially observed networks via node querying. Concretely, we study the following problem: given (i) a partially observed network, (ii) the ability to query nodes for their connections (e.g., by accessing an API), and (iii) a budget on the number of such queries, sequentially learn which nodes to query in order to maximally increase observability. We call this querying process Network Online Learning and present a family of algorithms called NOL*. These algorithms learn to choose which partially observed node to query next based on a parameterized model that is trained online through a process of exploration and exploitation. Extensive experiments on both synthetic and real world networks show that (i) it is possible to sequentially learn to choose which nodes are best to query in a network and (ii) some macroscopic properties of networks, such as the degree distribution and modular structure, impact the potential for learning and the optimal amount of random exploration. Keywords: Partially observed networks, Online learning, Heavy-tailed target distributions
Introduction Incomplete datasets are common in the analysis of networks because the phenomena under study are often partially observed. It has been shown that analysis of incomplete networks may lead to biased results (Sanz et al. 2012; Ghosh et al. 2013; GonzálezBailón et al. 2014; Sampson et al. 2015; Alves et al. 2020). Our work seeks to address the following problem: Given a partially observed network with no information about how it was observed and a budget to query the partially observed nodes, can we learn to sequentially ask optimal queries relative to some objective? In addition to introducing new methodology, we study when learning is feasible in this problem (see Fig. 1). We present a family of online learning algorithms, called NOL*, which are based on online regression and related to reinforcement learning techniques like approximate Qlearning. NOL* algorithms do not assume any a priori knowledge or estimate of the true © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original
Data Loading...