BioRxiv pic: MAGELLAN: Automated Generation of Interpretable Computational Models for Biological Reasoning

Published on 23 May 2025 at 17:53

MAGELLAN: Automated Generation of Interpretable Computational Models for
Biological Reasoning

Matthew A. Clarke, Charlie George Barker, Yuxin Sun, Theodoros I. Roumeliotis, Jyoti S.Choudhary and Jasmin Fisher

MAGELLAN (Message pAssing Graph nEuraL biomodeLAnalyzer Network) introduces a novel methodology that significantly advances hypothesis testing in biology by automating the generation of mechanistically interpretable models from biological pathway data and experimental results - aiming to save money, ressources and generate new knowledge. And to do so without the need to code. So expectations were high.

💡 Key strength

MAGELLAN’s key strength lies in its ability to generate mechanistically interpretable models that combine prior biological knowledge with qualitative experimental rules. Unlike black-box AI, it produces transparent, rule-based networks that can be simulated and understood by biologists. It is fast, scalable, and robust to noisy or incomplete data, making it ideal for real-world biological systems. MAGELLAN claims to be accessible to non-programmers through its integration with BioModelAnalyzer (BMA), and it guides experimental design by highlighting knowledge gaps and suggesting informative perturbations. However, the latter is not intuitive as extracting key pathway parameters from public data bases and incomplete input data documentation as well as hard to follow instructions on github make it hard to use at this point. But I am hoping for the revised and devlopped version. The BMA integration is definetly a PLUS.

🧠 How It Works

MAGELLAN builds interpretable biological models by combining a prior knowledge network (e.g., from KEGG or SIGNOR), where nodes represent biological entities (e.g., genes or proteins), with qualitative experimental rules (e.g., “if GR is active, IL6 decreases”). Edges in the network indicate the regulatory relationships between these entities. MAGELLAN transforms this network into a layered Graph Neural Network (GNN) that models the temporal progression of signaling events and optimizes it using the provided rules. The result is a dynamic, rule-based model that can both explain and predict system behavior.

Figure: Schematic overview of MAGELLAN. Prior knowledge networks—whether from KEGG, SIGNOR, or self-curated sources—are transformed into layered Graph Neural Networks (GNNs) and optimized using simple qualitative rules (e.g., activation or repression). These layers represent a temporal progression of regulatory events. The resulting model can be exported, visualized, and interactively tested in BioModelAnalyzer (BMA).

⚠️ Limitations

Conservative Predictions:
- MAGELLAN tends to under-predict synergistic effects in drug combinations compared to manually tuned models. While fast, MAGELLAN may miss nuanced behaviours captured through expert knowledge.
Real-World Data Challenges:
- Performance drops slightly when transitioning from synthetic benchmarks to noisy, real-world datasets.
No Direct Use of Quantitative Data:
- Currently optimized for discrete qualitative inputs, which could limit integration with some high-resolution omics datasets.
High barrier for non-computational users (as of the current state):
- The need to format networks, write specifications, and run models manually is a significant obstacle and hard with the missing documentation for non-coding biologists.

🔭 Final Thoughts

MAGELLAN provides a compelling alternative to black-box machine learning methods by offering mechanistically interpretable models. However, its claimed “user-friendliness” falls short, especially for non-coding biologists. Significant improvements in documentation—starting from pathway extraction through to model execution—are needed, along with real-world, end-to-end examples, ideally in the form of a well-annotated Jupyter notebook. While its performance on noisy, real-world datasets remains to be fully demonstrated, MAGELLAN clearly has the potential to accelerate hypothesis generation and refinement for expert users familiar with the biological background (to fine tune and reduce data noise).

« Previous BioRxiv pic: Mixing features of transcription factors and genes enables accurate prediction of gene regulation relationships for unknown transcription factors BioRxiv pic: Context matters for nucler hormone receptor signalling. See who is playing: Nurr77 and GR. Next »

Add comment

Comments

There are no comments yet.