P317 Critical dynamics improve performance in deep learning
Simon Vock*1,2,3,4,5, Christian Meisel1,2,4,5,6
1Computational Neurology Lab, Department of Neurology, Charité – Universitätsmedizin, Berlin, Germany
2Berlin Institute of Health, Berlin, Germany
3Faculty of Life Sciences, Humboldt University Berlin, Germany
4Bernstein Center for Computational Neuroscience, Berlin, Germany
5NeuroCure Cluster of Excellence, Charité – Universitätsmedizin, Berlin, Germany
6Center for Stroke Research, Berlin, Germany
*Email: simon.vock@charite.de
Introduction
Deep neural networks (DNNs) have revolutionized AI, yet their vast parameter space makes training difficult, often leading to inefficiencies or failure. Their optimization remains largely heuristic, relying on trial-and-error design [1,2]. In biological networks, recent evidence suggests that critical phase transitions - balancing signal propagation to avoid die-out or runaway excitation - are key for effective learning [3,4]. Inspired by this, we analyze 80 modern DNNs and uncover a fundamental link between performance and criticality, unifying diverse architectures under a single theoretical perspective. Building on this, we propose a novel training approach that guides DNNs toward criticality, enhancing performance on multiple datasets.
Methods
We characterize criticality in DNNs using three key metrics: A maximum dynamic range Δ [5], a branching parameter σ=1 [6], and the largest Lyapunov exponent λ₀=0 [7]. Our statistical analysis employs multiple tests including Spearman's rank, Wilcoxon signed-rank, Mann-Whitney U, and linear mixed-effects models. We investigate 80 highly optimized DNNs from TorchVision pre-trained on the ImageNet-1k dataset [8]. We use the Modified National Institute of Standards and Technology (MNIST) dataset, a standard benchmark for computer vision. Building on our findings, we develop a novel training objective that specifically drives models toward criticality during the training process.
Results
We derive a set of measures quantifying the distance to criticality on DNNs and analyze 80 pre-trained DNNs from Torchvision (ImageNet-1k). We found that over the last decade, as test accuracies increased, networks became significantly more critical. Our analysis shows that test accuracies are highly correlated with criticality and model size. A linear mixed-effects model shows that distance to criticality and model size explain 60% of the variance in accuracy (R²). A novel training objective that penalizes distance to criticality improves MNIST accuracy by up to 0.8% compared to highly optimized DNNs. In a continual learning setting using ImageNet, this approach enhances neuronal plasticity and outperforms established training techniques.
Discussion
Analyzing 80 diverse DNNs developed over the last decade, we uncover two key ingredients for high-performance deep learning: Network size and critical neuron dynamics. We find that modern deep learning techniques implicitly enhance criticality, driving recent advancements in the field. We show how improved DNN architectures and training approaches promote criticality, and further introduce a novel training method that enforces criticality during training. This significantly boosts accuracy on MNIST. Additionally, our method enhances the network’s plasticity, improving adaptability to new information in continual learning. We expect these findings to generalize to other models and tasks, offering a path toward more efficient AI.
Acknowledgements
References
1. Glorot, X., & Bengio, Y. (n.d.). Understanding the difficulty of training deep feedforward neural networks. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 9.
2.https://doi.org/10.1038/nature14539
3.https://doi.org/10.1523/JNEUROSCI.23-35-11167.2003
4.https://doi.org/10.1016/0167-2789(90)90064-V
5.https://doi.org/10.1523/JNEUROSCI.3864-09.2009
6.https://doi.org/10.1103/PhysRevLett.94.058101
7.https://doi.org/10.1103/PhysRevLett.132.057301
8.https://doi.org/10.1109/CVPR.2009.5206848