惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs.AI updates on arXiv.org

Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems Saturating Scaling Laws for Equational Discovery: A Phenomenology of Growth Dynamics in Three Toy Substrates with Two Real-World Replications DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors Beyond Predefined Learning Objects: A Thinking-Learning Interaction Model for Up-to-Date Autonomous Robot Learning Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning EvoCode-Bench: Evaluating Coding Agents in Multi-Turn Iterative Interactions Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems Right-Sizing Communication and Recommendation Set Size in AI-Assisted Search Machine Psychometrics: A Mathematical Psychology of Artificial Intelligence LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems MEMOR-E: In-Context and Fine-Tuned LLM Personalization for Alzheimer's Assistive Robotics BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning Accelerating Long-Tail Generation in Synchronous RLHF Training via Adaptive Tensor Parallelism BODHI: Precise OS Kernel Specification Inference Spacetime Formation under Requirements: Contextual Realization and Form-Dependent Probability LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills Stop Comparing LLM Agents Without Disclosing the Harness A Dynamical Framework for Cognitive Processes Based on Transformations and Semantic Equivalence When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure Confidence Calibration in Large Language Models Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test SPACENUM: Revisiting Spatial Numerical Understanding in VLMs SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction EDGE-OPD: Internalizing Privileged Context with Evidence Guided On-Policy Distillation ETCHR: Editing To Clarify and Harness Reasoning MedExpMem: Adapting Experience Memory for Differential Diagnosis SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion KPI2KVI: A Multi Agent Workflow for Calculating Key Value Indicators from Service Descriptions Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems Human-Centered Learning Mechanics: A Dynamical Framework for Entropy-Regulated Representation Learning Computable Fairness: Boltzmann-Softmax Control for AI Resource Allocation Anytime Training with Schedule-Free Spectral Optimization One-Forcing: Towards Stable One-Step Autoregressive Video Generation Solving the Aircraft Disassembly Scheduling Problem Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision? Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals DART: Semantic Recoverability for Structured Tool Agents CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs ObjectCache: Layerwise Object-Storage Retrieval for KV Cache Reuse When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions FastKernels: Benchmarking GPU Kernel Generation in Production A mathematical theory of balancing relational generalization and memorization Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection Ontological Knowledge Blocks: Executable Compliance and Profile-Based Validation for Trustworthy AI Systems DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling LLM Code Smells: A Taxonomy and Detection Approach RAG4Outcome: A Retrieval-Augmented Multimodal Framework for Prognostic Prediction in Chronic Osteomyelitis Tensor Cache: Eviction-conditioned Associative Memory for Transformers One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents An AI-Driven Framework for Energy-Efficient Environmental Monitoring in Smart Cities Using Edge Intelligence LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection The TIME Machine: On The Power of Motion for Efficient Perception Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering CP or DP? Why Not Both: A Case Study in the Partial Shop Scheduling Problem OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills Human-in-the-Loop Multi-Agent Ventilator Decision Support with Contextual Bandit Preference Learning PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations Robots That Know What to Ask: Recovering Misaligned Rewards through Targeted Explanations Lipschitz Optimization for Formal Verification of Homographies Expressive Power of Deep Homomorphism Networks over Relational Databases Autonomous Frontier-Based Exploration with VLM Guidance Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation ChainFlow-VLA: Causal Flow Planning with Vision-Language Models Decomposing and Measuring Evaluation Awareness Uncovering the Latent Potential of Deep Intermediate Representations Multimodal Distribution Matching for Vision-Language Dataset Distillation Online Hand Gesture Recognition Using 3D Convolutional Neural Networks Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations SkillOpt: Executive Strategy for Self-Evolving Agent Skills Strategic Coercion Within Alliances: The Greenland Sovereignty Game as an AI Stress Test Multi-Gate Residuals Test-Time Training Undermines Safety Guardrails Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking Agentic Proving for Program Verification PhenoYieldNet: Learning Crop-Aware Phenological Responses for Multi-Crop Yield Prediction Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution
EvoSci: A Bio-Inspired Multi-Agent Framework for the Evolution of Scientific Discovery
Xiaoyu Xiong · 2026-05-26 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:Large language models (LLMs), have shown strong potential in scientific discovery, yet existing methods still face substantial challenges in the design of research workflows and multi-role collaboration mechanisms. To mitigate these issues, we propose EvoSci, a multi-agent scientific collaboration framework, which integrates bio-inspired evolution with knowledge graph modeling. To iteratively generate, evaluate, and refine research ideas, EvoSci incorporates multiple role-based agents, including mentor, researcher, and reviewer. By combining collaborative reasoning, shared memory, and evolutionary feedback, EvoSci significantly enhances the coherence and creativity of scientific exploration. Experiments on real-world research topics demonstrate that EvoSci significantly outperforms strong baselines in LLM-based structured peer-review and comparative ranking evaluations, achieving the highest overall peer-review score (ICLR 4.90) and top ranking (Top-10 = 54). These results suggest its superiority in both scientific idea generation and continuous discovery.
Comments: ACL 2026 Main Conference
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cite as: arXiv:2605.24018 [cs.AI]
  (or arXiv:2605.24018v1 [cs.AI] for this version)
  https://doi.org/10.48550/arXiv.2605.24018

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Xiaoyu Xiong [view email]
[v1] Wed, 20 May 2026 04:58:49 UTC (2,551 KB)