





















Authors:Zhaokun Yan, Shan Xu, Wuzheng Dong, Zhaohan Liu, Lijie Feng, Chengxiao Dai, Chen Tianqi, Binfan Liu, Yunpu Ma, Wenting Wei, Yingting Li, Yi Zhang, Tongning Wu
Abstract:Public health reasoning requires population level inference grounded in scientific evidence, expert consensus, and safety constraints. However, it remains underexplored as a structured machine learning problem with limited supervised signals and benchmarks. We introduce GlobalHealthAtlas, a large scale multilingual dataset of 280,210 instances spanning 15 public health domains and 17 languages. We further propose a large language model (LLM) assisted construction and quality control pipeline with retrieval, deduplication, evidence grounding checks, and label validation to improve consistency at scale. Finally, we present a domain aligned evaluator distilled from high confidence judgments of diverse LLMs to assess outputs along six dimensions: Accuracy, Reasoning, Completeness, Consensus Alignment, Terminology Norms, and Insightfulness. Together, these contributions enable reproducible training and evaluation of LLMs for safety critical public health reasoning beyond conventional QA benchmarks. We publicly release project codebase, evaluator, and model at:: this https URL, this https URL and this https URL
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2602.00491 [cs.CL] |
| (or arXiv:2602.00491v2 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2602.00491 arXiv-issued DOI via DataCite |
|
| Journal reference: | ICML 2026 regular |
From: Zhaokun Yan [view email]
[v1]
Sat, 31 Jan 2026 03:29:30 UTC (525 KB)
[v2]
Sat, 23 May 2026 01:24:59 UTC (1,936 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。