




















Authors:Weiya Li, Zhiwei Tang, Yizhou He, Chenghao Wang, Liang Feng, Xiao Sun, Dongrui Liu, Zichen Wen, Hu Wei, Jinghang Wang, Yi Luo, Li Guo, Linfeng Zhang
Abstract:With the rapid advancement of Deep Research Agents in knowledge-intensive domains such as finance, establishing reliable and domain-aligned evaluation standards remains a critical challenge. Existing benchmarks focus on either closed-ended question answering or open-ended report evaluation, failing to jointly capture retrieval-reasoning accuracy and end-to-end research quality required in real-world workflows. We introduce ICBCBench, a consortium-driven benchmark for financial deep research, developed in collaboration with domain experts from a broad range of financial institutions and academia, involving over 50 experts across more than 40 organizations. It adopts a dual-track paradigm integrating objective tasks with verifiable answers and subjective long-form report evaluation, enabling complementary assessment of retrieval-reasoning accuracy and end-to-end report quality in terms of expert alignment, citation consistency, and source quality. Experiments on state-of-the-art DRAs and large language models reveal substantial gaps in complex reasoning, factual grounding, and report quality, highlighting the challenges of achieving industry-level performance. Our dataset and evaluation framework are available at this https URL.
From: Weiya Li [view email]
[v1]
Tue, 16 Jun 2026 03:19:20 UTC (972 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。