
























Abstract:Large Language Models (LLMs) are central to the one-for-all intelligent paradigm, but they face a fundamental challenge when dealing with heterogeneous scientific data such as molecules: the inherent gap between discrete linguistic symbols and topological molecular or continuous reaction data leads to significant information loss and semantic noise in text-based reasoning. We propose SciCore-Mol, a modular framework that bridges this gap through three deeply integrated pluggable cognitive modules: a topology-aware perception module, a latent diffusion-based molecular generation module, and a reaction-aware reasoning module. Each module is coupled to the LLM backbone through learned representation interfaces, enabling richer information exchange than is possible with text-only tool feedback. Our experiments on diverse chemical tasks demonstrate that SciCore-Mol achieves strong comprehensive performance across molecular understanding, generation, reaction prediction, and general chemistry knowledge, with an 8B-parameter open-source system that is competitive with and in several dimensions surpasses proprietary large models. This work provides a systematic blueprint for equipping LLMs with scientific expertise through decoupled, pluggable, and flexibly orchestrated modules, with direct implications for drug design, chemical synthesis, and broader scientific discovery.
| Comments: | 15 pages, 4 figures, 9 tables. Preprint |
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.22287 [cs.AI] |
| (or arXiv:2605.22287v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.22287 arXiv-issued DOI via DataCite (pending registration) |
From: Yuxuan Chen [view email]
[v1]
Thu, 21 May 2026 10:37:53 UTC (1,800 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。