惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs.AI updates on arXiv.org

Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection CITYREP: A Unified Benchmark for Urban Representations Across Cities, Tasks, and Modalities Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications Second Guess: Detecting Uncertainty Through Abstention and Answer Stability in Small Language Models Clustering as Reasoning: A $k$-Means Interpretation of Chain-of-Thought Graph Learning Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models Distilling Game Code World Model Generation into Lightweight Large Language Models The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers Lattice theory and algebraic models for deep convolutional learning based on mathematical morphology Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning Multimodal Alignment and Preference Optimization for Zero-Shot Conditional RNA Generation Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition RECTOR: Priority-Aware Rule-Based Reranking for Compliance-Aware Autonomous Driving Trajectory Selection SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents Catching The Correct Answer Trap: Characterising AI Tutor Blind Spots When Analysing Student Reasoning LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs FLOATBench: A Dataset and Benchmark for Floating Offshore Wind Turbine Tower Fatigue Adaptive Human-AI Coordination via Hierarchical Action Disentanglement MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning Confidence Calibration in Large Language Models Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework Mixture of Complementary Agents for Robust LLM Ensemble AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration Raon-Speech Technical Report A Signal-Language Foundation Model for Broad-Spectrum Cardiovascular Assessment from Routine Electrocardiography Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs Toward Enactive Artificial Intelligence MDIA: A Multi-Agent Diagnostic Intelligence Pipeline on HealthBench Professional Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows Learning to Reason Efficiently with A* Post-Training Not All Transitions Matter: Evidence from PPO Partner-Aware Hierarchical Skill Discovery for Robust Human-AI Collaboration Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis More Skills, Worse Agents? Skill Shadowing Degrades Performance When Expanding Skill Libraries Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors Extracting Training Data from Diffusion Language Models via Infilling Summoning the Oracle to Slay It: Mitigating Look-Ahead Bias in Financial Backtesting with Large Language Models Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models Hidden-State Privacy Has an Empty Middle DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction High-Risk AI Systems and the Problem of Identity in the European AI Act Privacy-Preserving Local Language Models for Longitudinal Data Retrieval in Chronic Dermatologic Disease: Implementation in Pemphigus Patients When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization Exploration of Perceptual Speech Features for Clinical Decision-Support in Mental Health Care An Interpretable CF-RL-TOPSIS Fusion Model for Skills-Aware Talent Recommendation Nano World Models: A Minimalist Implementation of Future Video Prediction Feature Lottery? A Bifurcation Theory of Concept Emergence Inference Time Context Sparsity: Illusion or Opportunity? HeartBeatAI: An Interpretable and Robust Deep Learning Framework for Multi-Label ECG Arrhythmia Detection IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning When Mean CE Fails: Median CE Can Better Track Language Model Quality MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration Beyond the Frontier: Stochastic Backtracking for Efficient Test-Time Scaling Insuring Every Action: An Authority Frontier Framework for Runtime Actuarial Control of Autonomous AI Agents Remote sensing data imputation using deep learning for multispectral imagery Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform When Does Synthetic Patent Data Help? Volume-Fidelity Trade-offs in Low-Resource Multi-Label Classification In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models Machine Psychometrics: A Mathematical Psychology of Artificial Intelligence How Well Do Models Follow Their Constitutions? Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks Hypothesis Generation and Inductive Inference in Children and Language Models Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security Fundamental Limitation in Explaining AI Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood? CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection Safety-Oriented Routing Analysis of Mixtral MoE Under Benign and Harmful Prompts Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions From Model Scaling to System Scaling: Scaling the Harness in Agentic AI QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning Empirical Analysis and Detection of Hallucinations in LLM-Generated Bug Report Summaries Diff-Instruct with Diffused Reward: Towards Principled One-step Generator RL
Understanding Conversational Patterns in Multi-agent Programming: A Case Study on Fibonacci Game Development
Srijita Basu · 2026-05-26 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) are increasingly applied to software engineering (SE), yet their potential for autonomous, role-oriented collaboration remains largely underexplored. Understanding how multiple LLM-based agents coordinate, maintain role alignment, and converge on solutions is critical for SE, as naively allowing agents to interact does not reliably lead to correct or stable outcomes. Recent empirical studies show that unstructured or poorly understood interaction dynamics can result in error propagation, premature consensus on incorrect solutions, or prolonged disagreement that prevents convergence, even when correct partial solutions are present early in the interaction. As an initial step towards addressing this underexplored area, we undertake a systematic analysis of conversations between two agents, a Designer and a Programmer across 12 model combinations from 7 open-source LLMs (Gemma 2, Gemma 3, LLaMA 3.2, LLaMA 3.3, DeepSeek-R1, MiniCPM, and Qwen3). Our systematic approach reveals three key dimensions of multi-agent interaction: efficiency (the speed and stability of convergence), consistency (the degree of role alignment visualized by BLEU and ROUGE), and effectiveness (the extent of compilation success and error resolution). Results show that the DeepSeek-R1:DeepSeek-R1 pair was unique in converging to the correct solution from the very first iteration and sustaining it consistently to the final iteration, while LLaMA 3.2:LLaMA 3.2 and Qwen3:Qwen3 demonstrated strong Designer:Programmer role alignment despite of diverging from the correct solution. The other pairs deviated from the task, never to converge to a result. These findings advance understanding of agentic programming and highlight the need for further research on understanding and calibrating convergence and stop conditions essential for future autonomous SE.
Comments: 10 pages, 7 figures, AIware, FSE 2026
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Cite as: arXiv:2605.24138 [cs.SE]
  (or arXiv:2605.24138v1 [cs.SE] for this version)
  https://doi.org/10.48550/arXiv.2605.24138

arXiv-issued DOI via DataCite (pending registration)

Related DOI: https://doi.org/10.1145/3805760.3814914

DOI(s) linking to related resources

Submission history

From: Srijita Basu Dr. [view email]
[v1] Fri, 22 May 2026 18:56:47 UTC (6,656 KB)