Ising Bandits with Side Information

We develop an online learning algorithm for bandits on a graph with side information where there is an underlying Ising distribution over the vertices at low temperatures. We are motivated from practical settings where the graph state in a social or a com

PDF / 1,023,969 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
110 Downloads / 241 Views

DOWNLOAD

REPORT

Abstract. We develop an online learning algorithm for bandits on a graph with side information where there is an underlying Ising distribution over the vertices at low temperatures. We are motivated from practical settings where the graph state in a social or a computer hosts network (potentially) changes at every trial; intrinsically partitioning the graph thus requiring the learning algorithm to play the bandit from the current partition. Our algorithm essentially functions as a two stage process. In the ﬁrst stage it uses “minimum-cut ” as the regularity measure to compute the state of the network by using the side label received and acting as a graph classiﬁer. The classiﬁer internally uses a polynomial time linear programming relaxation technique that incorporates the known information to predict the unknown states. The second stage ensures that the bandits are sampled from the appropriate partition of the graph with the potential for exploring the other part. We achieve this by running the adversarial multi armed bandit for the edges in the current partition while exploring the “cut” edges. We empirically evaluate the strength of our approach through synthetic and real world datasets. We also indicate the potential for a linear time exact algorithm for calculating the max-ﬂow as an alternative to the linear programming relaxation, besides promising bounded mistakes/regret in the number of times the “cut” changes.

1

Introduction

Many domains encounter a problem in collection of annotated training data due to the diﬃculty and costs in requiring eﬀorts of human annotators, while the abundant unlabelled data come for free. What makes the problem more challenging is the data might often exhibit complex interactions that violate the independent and identically distributed assumption of the data generation process. In such domains, it is imperative that learning techniques can learn from unlabelled data and the rich interactions based structure of the data. Learning from unlabelled and a few labelled data falls under the purview of semi-supervised learning. Coupling it with an encoding of the data interdependencies as a graph, results in an attractive problem of learning on graphs. Often, interesting applications are tied to such problems with rich underlying structure. For example, consider the system of online advertising; serving advertisements on web pages in an incremental fashion. The web pages can be represented as vertices in the graph with the links as edges. At given time t, c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part I, LNAI 9284, pp. 448–463, 2015. DOI: 10.1007/978-3-319-23528-8 28

Ising Bandits with Side Information

449

the system receives a request to serve an advertisement on a randomly selected web-page. Moreover, at the same time, the system receives a side information about the state of the web-page: for simplicity we assume the side information to be a rating of 0 or 1. As a consequence, the advertisement pool change with the change in

Data Loading...

Ising Bandits with Side Information

Recommend Documents

Recommending content using side information

Thompson Sampling for Bayesian Bandits with Resets

Quantum bandits

Partial multi-label learning with noisy side information

Weakly Supervised Learning with Side Information for Noisy Labeled Images

Deep Image Compression Using Decoder Side Information

Incorporating Side Information by Adaptive Convolution

Refined Algorithms for Infinitely Many-Armed Bandits with Deterministic Rewards

Quantum Ising Phases and Transitions in Transverse Ising Models

Online Semi-supervised Learning in Contextual Bandits with Episodic Reward

Ising. A Magnetic Modesty

POI Recommendations Using Self-attention Based on Side Information