





















Abstract:Large language models (LLMs) have demonstrated promising performance in generating diagnostic conclusions from imaging findings, thereby supporting radiology reporting, trainee education, and quality control. However, systematic guidance on how to optimize prompt design across different clinical contexts remains underexplored. Moreover, a comprehensive and standardized framework for assessing the trustworthiness of LLM-generated radiology reports is yet to be established. This study aims to enhance the trustworthiness of LLM-generated liver MRI reports by introducing a Multi-Dimensional Credibility Assessment (MDCA) framework and providing guidance on institution-specific prompt optimization. The proposed framework is applied to evaluate and compare the performance of several advanced LLMs, including Kimi-K2-Instruct-0905, Qwen3-235B-A22B-Instruct-2507, DeepSeek-V3, and ByteDance-Seed-OSS-36B-Instruct, using the SiliconFlow platform.
| Comments: | 10 pages, 6 figures, 4 tables |
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2510.23008 [cs.AI] |
| (or arXiv:2510.23008v3 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2510.23008 arXiv-issued DOI via DataCite |
From: Qiuli Wang [view email]
[v1]
Mon, 27 Oct 2025 04:57:20 UTC (2,700 KB)
[v2]
Tue, 28 Oct 2025 02:12:09 UTC (2,700 KB)
[v3]
Mon, 25 May 2026 08:27:58 UTC (2,701 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。