


























Abstract:Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However, their application in open-ended scientific domains such as biology remains limited, primarily due to the lack of factually grounded and actionable explanations. To address this, we introduce a structured explanation formalism for virtual cells that represents biological reasoning as mechanistic action graphs, enabling systematic verification and falsification. Building upon this, we propose VCR-Agent, a multi-agent framework that integrates biologically grounded knowledge retrieval with a verifier-based filtering approach to generate and validate mechanistic reasoning autonomously. Using this framework, we release VC-TRACES dataset, which consists of verified mechanistic explanations derived from the Tahoe-100M atlas. Empirically, we demonstrate that training with these explanations improves factual precision and provides a more effective supervision signal for downstream gene expression prediction. These results underscore the importance of reliable mechanistic reasoning for virtual cells, achieved through the synergy of multi-agent and rigorous verification.
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2604.11661 [cs.LG] |
| (or arXiv:2604.11661v3 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2604.11661 arXiv-issued DOI via DataCite |
From: Yunhui Jang [view email]
[v1]
Mon, 13 Apr 2026 16:10:44 UTC (440 KB)
[v2]
Tue, 14 Apr 2026 04:56:30 UTC (440 KB)
[v3]
Wed, 20 May 2026 12:43:59 UTC (440 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。