Robust valence-induced biases on motor response and confidence in human reinforcement learning

  • PDF / 2,256,202 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 55 Downloads / 169 Views

DOWNLOAD

REPORT


Robust valence-induced biases on motor response and confidence in human reinforcement learning Chih-Chung Ting 1 & Stefano Palminteri 2,3,4 & Jan B. Engelmann 1,5,6 & Maël Lebreton 7,8

# The Author(s) 2020

Abstract In simple instrumental-learning tasks, humans learn to seek gains and to avoid losses equally well. Yet, two effects of valence are observed. First, decisions in loss-contexts are slower. Second, loss contexts decrease individuals’ confidence in their choices. Whether these two effects are two manifestations of a single mechanism or whether they can be partially dissociated is unknown. Across six experiments, we attempted to disrupt the valence-induced motor bias effects by manipulating the mapping between decisions and actions and imposing constraints on response times (RTs). Our goal was to assess the presence of the valenceinduced confidence bias in the absence of the RT bias. We observed both motor and confidence biases despite our disruption attempts, establishing that the effects of valence on motor and metacognitive responses are very robust and replicable. Nonetheless, within- and between-individual inferences reveal that the confidence bias resists the disruption of the RT bias. Therefore, although concomitant in most cases, valence-induced motor and confidence biases seem to be partly dissociable. These results highlight new important mechanistic constraints that should be incorporated in learning models to jointly explain choice, reaction times and confidence. Keywords Meta-cognition . Reinforcement-leaning . Confidence . Valence-induced bias

Introduction In the reinforcement learning context, reward-seeking and punishment-avoidance present an intrinsic and fundamental informational asymmetry. In the former situation, accurate choice (i.e., reward maximization) increases the frequency of the reinforcer (the reward). In the latter situation, accurate choice (i.e., successful avoidance), optimal behavior decreases the frequency of the response. Accordingly, most

simple incremental “law-of-effect”-like models would predict higher performance in the reward seeking compared the punishment avoidance situation. Yet, humans learn to seek reward and to avoid punishment equally-well (Fontanesi et al., 2019; Guitart-Masip et al., 2012; Palminteri et al., 2015). This is not only robustly demonstrated in experimental data, but also nicely explained by context-dependent reinforcement-learning models (Fontanesi et al., 2019; Palminteri et al., 2015), which can be seen as formal computational instantiations of

Jan B. Engelmann and Maël Lebreton Shared senior-authorship Electronic supplementary material The online version of this article (https://doi.org/10.3758/s13415-020-00826-0) contains supplementary material, which is available to authorized users. * Maël Lebreton [email protected] 1

CREED, Amsterdam School of Economics (ASE), Universiteit van Amsterdam, Amsterdam, the Netherlands

2

Département d’études cognitives, Ecole Normale Supérieure, Paris, France

3

Laboratoire de Neurosciences Cogni