Name: P074: Hybridizing Machine Reinforcement Learning with Neuromimetic Navigation Systems
Start: 2025-07-06T17:20:00+0200
End: 2025-07-06T19:20:00+0200

Sunday July 6, 2025 17:20 - 19:20 CEST

Passi Perduti

P074 Hybridizing Machine Reinforcement Learning with Neuromimetic Navigation Systems

3Christopher Earl,4Moshe Tannenbaum,1Haroon Anwar,4Hananel Hazan,1,2Samuel Neymotin
1Center for Biomedical Imaging and Neuromodulation, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA.
2Department of Psychiatry, NYU School of Medicine, New York, NY, USA.
3Department of Computer Science, University of Massachusetts, Amherst, MA, USA.4Allen Discovery Center, Tufts University, Boston, MA, USA.
Introduction

Animal brains are capable of remembering explored locations and complex pathways to efficiently reach goals. Many neuroscience studies have investigated the cellular and circuit basis of spatial representations in the Hippocampal (HPC) and Entorhinal Cortex (EC); however, mechanisms that enable this complex navigation remain a mystery. In computer sciences, Q-Learning (QL) is a Reinforcement Learning (RL) algorithm that facilitates associations between contexts, actions, and long-term consequences. In this study, we develop a bio-inspired neuronal network model which integrates cell types from the mammalian EC and HPC hybridized with QL to simulate how the brain could learn to navigate new environments.
Methods
We used the BindsNET platform [1] to model Grid Cells (GC), Place Cells (PC), and motor control cells to drive agent actions. Our model is a Spiking Neuronal Network (SNN) with leaky integrate-and-fire (LIF) neurons, and organized to mimic GCs and PCs found in the EC and HPC (Fig 1). Reward-Modulated Spike Time Dependent Plasticity (RM-STDP) [2,3] applied to synapses activating motor control cells facilitates learning. The RM-STDP mechanism receives rewards from a Q-Table, helping the agent associate actions with long-term consequences. The agent is tasked with navigating a maze and learning a path to a goal (Fig 2). Feedback is given only at the goal, requiring the agent to associate actions with long-term outcomes to solve the maze.
Results
Trained models successfully and consistently navigated randomly generated mazes. GC populations encoded distinct physical locations into unique neural encodings, enabling the agent to distinguish between them. This lets the agent remember previously visited areas and associate them with actions. Combined with QL, long-term consequences of actions could also be retained, allowing the model to learn long paths to the goal with sparse environmental feedback.

Certain cells in the reservoir population fired only when the agent was in a specific location of the maze, suggesting these cells naturally developed PC-like characteristics. When GC’s were re-oriented in a new maze, the PC’s would remap, similar to behavior observed in biology [4].
Discussion
We designed an SNN model that mimics the mammalian brain’s spatial representation system, and integrated it with QL to solve a maze task. Our model forms the basis of a functional navigation system by effectively associating actions with long-term consequences. While the QL component is not biologically-plausible, we believe higher order brain areas could provide similar computational capabilities. In future work, we aim to implement QL as a SNN. Results also suggest an explanation for the emergence of PC in the HPC due to upstream GC activity in the EC. Moreover, GC spatial representations are likely generalizable outside of a maze. Future research could utilize our model’s GC-PC architecture to navigate more complex environments.

Figure 1. Fig 1: Diagram of bio-inspired SNN, and its relationship to QL. Bio-inspired SNN generates an action, feedback from the environment is fed into a datastructure called a ‘Q-Table’, and updates from this table modulate RM-STDP synapses in the SNN. Fig 2: Example 5x5 maze environment. Green dot represents start, blue dot the goal, yellow dot the agent position, and red dots the optimal path.
Acknowledgements
Research supported by ARL Cooperative Agreement W911NF-22-2-0139 and ARL/ORAU Fellowships
References
[1]BINDSNet: A machine learning-oriented spiking neural networks library in Python.Front Neuroinform2018 12(2018):89

[2] Training spiking neuronal networks to perform motor control using reinforcement and evolutionary learning.Front Comput Neurosci2022 16:1017284

[3] Training a spiking neuronal network model of visual-motor cortex to play a virtual racket-ball game using reinforcement learning.PLoS One2022 17(5):e0265808
[4]Remapping revisited: how the hippocampus represents different spaces.Nat Rev Neurosci202425(6):428-448

Speakers

CNS*2025 Florence

Haroon Anwar

Christopher Earl

Hananel Hazan

Samuel A. Neymotin

Moshe Tannenbaum