Robust experimentation in the continuous time bandit problem

  • PDF / 527,025 Bytes
  • 31 Pages / 439.37 x 666.142 pts Page_size
  • 31 Downloads / 203 Views

DOWNLOAD

REPORT


Robust experimentation in the continuous time bandit problem Farzad Pourbabaee1 Received: 9 January 2020 / Accepted: 9 November 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract We study the experimentation dynamics of a decision maker (DM) in a two-armed bandit setup (Bolton and Harris in Econometrica 67(2):349–374, 1999), where the agent holds ambiguous beliefs regarding the distribution of the return process of one arm and is certain about the other one. The DM entertains Multiplier preferences à la Hansen and Sargent (Am. Econ. Rev. 91(2):60–66, 2001), thus we frame the decision making environment as a two-player differential game against nature in continuous time. We characterize the DM’s value function and her optimal experimentation strategy that turns out to follow a cut-off rule with respect to her belief process. The belief threshold for exploring the ambiguous arm is found in closed form and is shown to be increasing with respect to the ambiguity aversion index. We then study the effect of provision of an unambiguous information source about the ambiguous arm. Interestingly, we show that the exploration threshold rises unambiguously as a result of this new information source, thereby leading to more conservatism. This analysis also sheds light on the efficient time to reach for an expert opinion. Keywords Model uncertainty · Dynamic experimentation · Variational preferences · Information valuation · Ambiguous diffusion JEL Classification C44 · C61 · C73 · D81 · D83

I would like to thank Robert M. Anderson, Philipp Strack, Gustavo Manso and Demian Pouzo for the support and guidance over the course of this paper, and I am grateful to Haluk Ergin, Chris Shannon and David Ahn for the valuable comments and suggestions. All remaining errors are mine.

B 1

Farzad Pourbabaee [email protected] University of California, 414 Evans Hall, Berkeley, CA 94720, USA

123

F. Pourbabaee

1 Introduction There are natural cases where the experimentation shall be performed in ambiguous environments, where the distribution of future shocks is unknown. For example, consider a diagnostician who has two treatments for a particular set of symptoms. One is the conventional treatment that has been widely tested and has a known success rate. Alternatively, there is a second treatment that is recently discovered and is due to further study. The diagnostician shall perform a sequence of experiments on patients to figure out the success/failure rate of the new treatment. However, the adversarial effects of the mistreatment on certain types of patients are fatal, thus the medics must consider the worst-case scenario on the patients while evaluating the new treatment. As another case, consider the R&D example of Weitzman (1979), where the research department of an organization is assigned with the task of selecting one of the two technologies producing the same commodity. The research division holds a prior on the generated saving of each technology, but the observations of each alternative during the exper