P050 Linking biological context to models with ModelDB
InessaCohen1,XinchengCai2,MengmengDu2,YitingKong2,HongyiYu2,Robert A. McDougal*1,2,3,4
1Program in Computational Biology and Biomedical Informatics, Yale University, New Haven, CT, USA
2Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
3Department of Biomedical Informatics and Data Science, YaleSchool of Medicine, New Haven, CT, USA
4Wu Tsai Institute, Yale University, New Haven, CT, USA
*Email: robert.mcdougal@yale.edu
Introduction
ModelDB (https://modeldb.science) was founded almost 30 years ago to address the challenges of reproducibility in computational neuroscience, promote code reuse, and facilitate model discovery. It has grown to hold the source code for ~1,900 published studies. Recent enhancements, presented here, have focused on expanding its model collection and improving its biological context. However, discoverability and interpretability depend on having reliable metadata for entire models and their components. To address this, we sought to use machine learning (ML) to classify ion channel subtypes based on source code, identify key predictors, and compared results to those from a large language model (LLM).
Methods
We applied manual and automatic techniques to increase the biological context displayed when exploring ModelDB as well as increased the visibility of existing data. Network model properties and some file-level ion channel types were manually annotated. Biology-focused explanations of model files were generated automatically by an LLM. Features were extracted using a rule-based approach from NEURON [1] MOD files (a common format of ion channel and receptor model components) after deduplication that ignored white space and comments. Five-fold cross-validation was used to assess ML predictions. Subsets of model code from many files and a controlled vocabulary were provided to an LLM to generate whole-model metadata which was assessed manually.
Results
We have updated the ModelDB website to support more types of models and to pair browsing models and files with biological and computational context. The ML classifier identified a number of features (state count, nonspecific currents, using common ions) as key for predicting ion channel type. It worked well for identifying broad channel types but struggled with more granular subtype identification which had few examples in our training set. Calcium-activated potassium channels were one of the best performing subtypes. ML results were compared with those from an LLM and from rule-based approaches. LLM performance on whole model metadata prediction from source code was highly dependent on the broad category of metadata.
Discussion
ModelDB has long prioritized connecting models to biology, from its days as part of the SenseLab project, where its sister-site NeuronDB [2] once gathered compartment-level channel expression data. Many model submitters now chose to contribute an “experimental motivation” when submitting new models. Biology and model code are both often unclear on what should count as “the same,” posing challenges for both manual and automated metadata assignment. Nevertheless, it is our hope that pairing code with enriched biological context will make computational models more accessible, interpretable, and reusable.
Acknowledgements
We thankRui Lifor curatingModelDBmodel network metadata.
References
1.Hines, M. L., & Carnevale, N. T. (1997). The NEURON simulation environment.Neural computation, 9(6), 1179-1209.https://doi.org/10.1162/neco.1997.9.6.1179
2.Mirsky, J. S., Nadkarni, P. M., Healy, M. D., Miller, P. L., & Shepherd, G. M. (1998). Database tools for integrating and searching membrane property data correlated with neuronal morphology.Journal of neuroscience methods, 82(1), 105-121.https://doi.org/10.1016/S0165-0270(98)00049-1