惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
人人都是产品经理
人人都是产品经理
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
有赞技术团队
有赞技术团队
博客园 - 聂微东
C
Cybersecurity and Infrastructure Security Agency CISA
S
SegmentFault 最新的问题
博客园_首页
I
InfoQ
A
About on SuperTechFans
Apple Machine Learning Research
Apple Machine Learning Research
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
美团技术团队
T
Tor Project blog
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
V
Visual Studio Blog
WordPress大学
WordPress大学
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
T
Tailwind CSS Blog
P
Palo Alto Networks Blog
博客园 - 叶小钗
N
News and Events Feed by Topic
Google DeepMind News
Google DeepMind News
Last Week in AI
Last Week in AI
小众软件
小众软件
N
News and Events Feed by Topic
Spread Privacy
Spread Privacy
O
OpenAI News
N
News | PayPal Newsroom
H
Help Net Security
Recent Announcements
Recent Announcements
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
酷 壳 – CoolShell
酷 壳 – CoolShell
PCI Perspectives
PCI Perspectives
M
MIT News - Artificial intelligence
云风的 BLOG
云风的 BLOG
罗磊的独立博客
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The GitHub Blog
The GitHub Blog
Google Online Security Blog
Google Online Security Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
IT之家
IT之家
Y
Y Combinator Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
博客园 - 【当耐特】
T
The Blog of Author Tim Ferriss
AWS News Blog
AWS News Blog
W
WeLiveSecurity
www.infosecurity-magazine.com
www.infosecurity-magazine.com
NISL@THU
NISL@THU

cs.AI updates on arXiv.org

暂无文章

Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving
Chuxue Cao, Mengze Li, Juntao Dai, Jinluan Yang, Zijian Zhao, Sh · 2025-06-21 · via cs.AI updates on arXiv.org

Large language models (LLMs) have shown promising first-order logic (FOL) reasoning capabilities with applications in various areas. However, their effectiveness in complex mathematical reasoning involving multi-step FOL deductions is still under-researched. While LLMs perform competitively on established mathematical reasoning benchmarks, they struggle with multi-step FOL tasks, as demonstrated by Deepseek-Prover-V2-7B's low accuracy (4.2%) on our proposed theorem proving dataset. This issue arises from the limited exploration of diverse proof strategies and the potential for early reasoning mistakes to undermine entire proofs. To address these issues, we propose DREAM, a self-adaptive solution that enhances the Diversity and REAsonability of LLMs' generation strategies. DREAM incorporates an Axiom-Driven Strategy Diversification mechanism to promote varied strategic outcomes and a Sub-Proposition Error Feedback to help LLMs reflect on and correct their proofs. Our contributions include pioneering advancements in LLMs' mathematical reasoning through FOL theorem proving, introducing a novel inference stage solution that improves performance by 0.6% to 6.4%, and providing a curated dataset of 447 mathematical theorems in Lean 4 format for evaluation.