P282 Population-level mechanisms of model arbitration in the prefrontal cortex
Jae Hyung Woo1*, Michael C Wang1*, Ramon Bartolo2, Bruno B. Averbeck3,Alireza Soltani1+
1Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755, USA 2The Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, MD, USA 3Laboratory of Neuropsychology, National Institute of Mental Health, Bethesda, MD 20892, USA +Email:alireza.soltani@gmail.com Introduction
One of the biggest challenges of learning in naturalistic settings is that every choice option involves multiple attributes or features, each of which could potentially predict the outcome. To make decisions accurately and efficiently, the brain must attribute the outcome to the relevant features of a choice or action, while disregarding the irrelevant ones. To manage this uncertainty, it is proposed that the brain maintains several internal models of the environment––each predicting outcomes based on different attributes of choice options––and utilizes the reliability of these models to select the appropriate one to guide decision making [1-3].
Methods To uncover computational and neural mechanisms underlying model arbitration, we reanalyzed data from high-density recordings of the lateral prefrontal cortex (PFC) activity in monkeys performing a probabilistic reversal learning task with uncertainty about the correct model of the environment. We constructed multiple computational models based on reinforcement learning (RL) to fit choice behavior on a trial-by-trial basis, which allowed us to infer animals’ learning and arbitration strategies. We then used estimates based on the best-fitting model to identify single-cell and population-level neural signals related to learning and arbitration in the lateral PFC.
Results We found evidence of dynamic, competitive interactions between stimulus-based and action-based learning, alongside single-cell and population-level representations of the arbitration weight. Arbitration enhanced task-relevant variables, suppressed irrelevant ones, and modulated the geometry of PFC representations by aligning differential value axes with the choice axis when relevant and making them orthogonal when irrelevant. Reward feedback emerged as a potential mechanism for these changes, as reward enhanced the representation of relevant differential values and choice while adjusting the alignment between differential value and choice subspaces according to the adopted learning strategy.
Discussion Overall, our results shed light on two major mechanisms for the dynamic interaction between model arbitration and value representation in the lateral PFC. Moreover, they provide evidence for a set of unified computational and neural mechanisms for behavioral flexibility in naturalistic environments, where there is no cue that explicitly signals the correct model of the environment.