P072 Partial models for learning in nonstationary environments
Christopher R. Dunne*1and Dhruva V. Raman1
1Department of Informatics, University of Sussex, Falmer, United Kingdom
*Email: C.Dunne@sussex.ac.uk
Introduction The computational goal of learning is often presented as optimality on a given task, subject to constraints. However, the notion of optimality makes strong assumptions: by definition, changes to the task structure will render an optimal agent suboptimal. This is problematic in ethological settings where an agent lacks the time or data to accurately model a task's latent states. We present a novel design principle (Hetlearn) for learning in nonstationary environments, inspired by the Drosophila mushroom body. It makes specific predictions on what should be inferred from the environment, as compared to Bayesian inference. We show that Hetlearn outperforms an optimal Bayesian agent and better matches human and macaque behavioural data. Methods We consider the task of learning from reward prediction errors (RPEs) in which an animal updates a valence based on RPEs (Fig. 1 E). Critically, the degree to which the RPE changes the valence is modulated by a learning rate. To set an adaptive learning rate, Hetlearn employs parallel sublearners with heterogeneous fixed assumptions about the environment (varied fixed learning rates). Ensemble predictions employ a weighted vote with weights dependent on recent sublearner performance. This allows rapid adaptation to unpredictable environmental changes without explicitly estimating complex latent variables. We compare Hetlearn against an advanced Bayesian agent [1] and to behavioural data from humans and macaques [2, 3, 4]. Results Hetlearn outcompetes a Bayesian agent [1] on reward learning in nonstationary environments (Fig. 1 A-D). It is also algorithmically simpler; it builds a partial generative model and does not track complex environmental statistics. Nonetheless, it aligns with behavioural data from humans and macaques as well as with previous models [2, 4] (Fig. 1 F-G). This is notable because qualitatively different models (Bayes optimal vs suboptimal) previously provided the best respective fit to these two datasets [2, 3]. As such, Hetlearn offers a unified learning principle for seemingly disparate strategies. Finally, Hetlearn is robust to model misspecification; its parameters can vary by an order of magnitude without performance decline. Discussion Hetlearn outcompetes [1] in part because it exploits a bottleneck in the learning process. An optimal learner needs to infer multiple quantities that impact a single bounded parameter: the learning rate. Conversely, Hetlearn tracks the recent performance of parallel learners with heterogeneous learning rates. In effect, it trades optimal performance in a stationary environment for generalisability across environments. This results in superior performance in unpredictably changing environments or those with limited time or data, which are the precise conditions in which animals outperform artificial neural networks. Crucially, Hetlearn generates new, testable predictions on what should be inferred from the environment in these regimes.
Figure 1. (A) Environments with varying statistics. (B, C) Learning rate tracking by Hetlearn and Bayesian agent [1]. (D) Hetlearn has lower mean squared error (MSE) across environments. (E) Reward prediction error (RPE) task. Bayesian agent explicitly tracks complex latent states (volatility and stochasticity) that Hetlearn tracks only implicitly. (F, G) Hetlearn matches human [2] and macaque [3, 4] data. Acknowledgements This research was supported by the Leverhulme Doctoral Scholarship programme be.AI – Biomimetic Embodied Artificial Intelligence at the University of Sussex. References [1]https://doi.org/10.1038/s41467-021-26731-9 [2]https://doi.org/10.1038/nn1954 [3]https://doi.org/10.1038/nn.3918 [4]https://doi.org/10.1016/j.neuron.2017.03.044