P271 Accelerated cortical microcircuit simulations on massively distributed memory
Catherine M. Schoefmann*1,2, Jan Finkbeiner1,2, Susanne Kunkel1
1Neuromorphic Software Ecosystems (PGI-15), Juelich Research Centre, Juelich, Germany 2RWTH Aachen University, Aachen, Germany
*Email: c.schoefmann@fz-juelich.de Introduction Comprehensive simulation studies of dynamical regimes of cortical networks with realistic synaptic densities depend on compute systems capable of running such models significantly faster than biological real time. Since CPUs still are the primary target for established simulators, an inherent bottleneck caused by the von Neumann design is frequent memory access with minimal compute. Distributed memory architectures, popularized by the need for massively parallel and scalable processing for AI workloads, offer an alternative.
Methods We introduce extensible simulation technology for spiking networks on massively distributed memory using Graphcore's IPUs (https://www.graphcore.ai). We demonstrate the efficiency of the new technology based on simulations of the microcircuit model by [1] commonly used as a reference benchmark. The model represents 1~mm² of cortical tissue, spanning around 300 million synapses, and is considered a building block of cortical function. Spike dynamics are statistically verified by comparison with the same simulations run on CPU with NEST[2].
Results We present a custom semi-directed communication algorithm especially suited for distributed and constrained memory environments, which allows a controlled trade-off between performance and memory usage. Our simulation code achieves an acceleration factor of 15x compared to real time for the full-scale cortical microcircuit model on the smallest device configuration capable of fitting the model in memory. This is competitive with the current record performance on a static FPGA cluster[3], and further speedup can be achieved at the cost of lower precision weights.
Discussion With negligible compilation times, the simulation code can be be extended seamlessly to a wide range of synapse and neuron models, as well as structural plasticity, unlocking a new class of models for extensive parameter-space explorations in computational neuroscience. Furthermore, we believe that our algorithm for scalable and parallelisable communication can be efficiently applied to different platforms. Acknowledgements The presented conceptual and algorithmic work is part of our long-term collaborative project to provide the technology for neural systems simulations (https://www.nest-initiative.org). Compute time on a Graphcore Bow Pod64 has been granted by Argonne Leadership Computing Facility (ALCF). This work is partly funded by Volkswagen Foundation. References [1]:https://doi.org/10.1093/cercor/bhs358 [2]:https://doi.org/10.5281/ZENODO.12624784 [3]:https://doi.org/10.3389/fncom.2023.1144143