Thompson Sampling for Bayesian Bandits with Resets

Multi-armed bandit problems are challenging sequential decision problems that have been widely studied as they constitute a mathematical framework that abstracts many different decision problems in fields such as machine learning, logistics, industrial op

PDF / 356,223 Bytes
12 Pages / 439.363 x 666.131 pts Page_size
74 Downloads / 220 Views

DOWNLOAD

REPORT

Abstract. Multi-armed bandit problems are challenging sequential decision problems that have been widely studied as they constitute a mathematical framework that abstracts many different decision problems in fields such as machine learning, logistics, industrial optimization, management of clinical trials, etc. In this paper we address a non stationary environment with expected rewards that are dynamically evolving, considering a particular type of drift, that we call resets, in which the arm qualities are re-initialized from time to time. We compare different arm selection strategies with simulations, focusing on a Bayesian method based on Thompson sampling (a simple, yet effective, technique for trading off between exploration and exploitation).

1 Introduction The multi-armed bandit problem is a general framework that can represent several different sequential decision problems. Essentially, a multi-armed bandit is a a slot machine with n arms (representing possible decisions), each associated with a different and unknown expected payoff (reward). The problem is to select the optimal (or nearoptimal) sequence of arms to pull in order to maximize reward (in expectation). Previous rewards obtained from earlier steps are taken into account in order to identify arms that are associated with high payoff; but since reward is uncertain, several pulls of the same arms are usually necessary to assess the quality of an arm with some confidence. A key concept in bandit problems is the trade-off between exploitation (pulling the arm with the highest estimated expected payoff) and exploration (focusing on getting more information about the expected payoffs of the other arms). A large number of works have addressed bandit problems [12,11,2,9,14,10]. In particular, large attention has been given to methods (including heuristics) that are computationally fast and produce a decision about the next arm to pull in very short time. This include action-value strategies, but also the family of UCB methods [1]. Thompson sampling is a simple and effective strategy for multi armed bandits. It can be implemented very efficiently and it is based on principled Bayesian reasoning. Essentially, this strategy maintain a probabilistic estimate on the value of the arms, and select arms according to their probability of being the optimal arm (it is a randomized selection strategy). This idea was first described eighty years ago [13]. However, it has been surprisingly neglected until it has been recently rediscovered [7,8,5,9], showing its effectiveness in a number of different settings. Most of the works on bandits assume a stationary distribution of rewards. In many situations, however, one cannot expect the quality of the different arms to be constant. P. Perny, M. Pirlot, and A. Tsouki`as (Eds.): ADT 2013, LNAI 8176, pp. 399–410, 2013. c Springer-Verlag Berlin Heidelberg 2013

400

P. Viappiani

If the values of the different possibilities change with time (non stationarity) the situation is that of dynamic bandits (also called restless in

Data Loading...

Thompson Sampling for Bayesian Bandits with Resets

Recommend Documents

Bayesian Inversion and Sampling Methods

Ising Bandits with Side Information

Bayesian Prediction and Adaptive Sampling Algorithms for Mobile Sensor Networks

Refined Algorithms for Infinitely Many-Armed Bandits with Deterministic Rewards

Adaptive Operator Selection Based on Dynamic Thompson Sampling for MOEA/D

Parameter identification for phase-field modeling of fracture: a Bayesian approach with sampling-free update

Quantum bandits

Thompson, Homer

Thompson, Homer

Davis, William Thompson

Curriculum Vitae for John Griggs Thompson

List of Publications for John Griggs Thompson