惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

NISL@THU
NISL@THU
Vercel News
Vercel News
aimingoo的专栏
aimingoo的专栏
P
Proofpoint News Feed
Stack Overflow Blog
Stack Overflow Blog
T
Tailwind CSS Blog
云风的 BLOG
云风的 BLOG
L
LangChain Blog
有赞技术团队
有赞技术团队
Last Week in AI
Last Week in AI
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
WordPress大学
WordPress大学
博客园 - 司徒正美
宝玉的分享
宝玉的分享
F
Full Disclosure
Microsoft Security Blog
Microsoft Security Blog
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
B
Blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Y
Y Combinator Blog
I
InfoQ
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
博客园 - 聂微东
博客园 - Franky
MyScale Blog
MyScale Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
T
The Blog of Author Tim Ferriss
月光博客
月光博客
H
Help Net Security
B
Blog RSS Feed
人人都是产品经理
人人都是产品经理
V
V2EX
罗磊的独立博客
小众软件
小众软件
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
大猫的无限游戏
大猫的无限游戏
N
Netflix TechBlog - Medium
A
About on SuperTechFans
Apple Machine Learning Research
Apple Machine Learning Research
Hugging Face - Blog
Hugging Face - Blog
S
SegmentFault 最新的问题
D
Docker
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
The Cloudflare Blog
量子位
Jina AI
Jina AI
博客园_首页

cs.AI updates on arXiv.org

暂无文章

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search
Yunbo Tang, Chengyi Yang, Shiyu Liu, Zhishang Xiang, Zerui Chen, · 2026-05-28 · via cs.AI updates on arXiv.org

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe \textbf{over-search}, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code and implementation details are released at https://github.com/XMUDeepLIT/SAAS.