惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs updates on arXiv.org

How Well Do Models Follow Their Constitutions? GIBLy: Improving 3D Semantic Segmentation through an Architecture-Agnostic Lightweight Geometric Inductive Bias Layer Ant Backpressure Routing for Dynamic Wireless Multi-hop Networks with Mixed Traffic Patterns Accuracy Analysis of the Proxy Point Method with Applications to Some Toeplitz Matrices Deep-Research Agents Can Be Poisoned via User-Generated Content Sketch Bug: Using Sketch-Based Input for Interactive Code Debugging MeVer at CheckThat! 2026: Cluster-Aware Hard-Negative Mining for Multilingual Scientific-Source Retrieval Bayesian Rational Search Engine User ECo-MoE: Embodiment-Conditioned Mixture of Experts Increases the Evolvability of Robots Unlocking Apple's Private Cloud Compute: An Analysis of Privacy-Preserving Artificial Intelligence Toward Enactive Artificial Intelligence Polar: Agentic RL on Any Harness at Scale Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks A Survey of Text and Speech Resources for Hausa and Fongbe: Availability, Quality, and Gaps for NLP Development Query-Adaptive Semantic Chunking for Retrieval-Augmented Generation: A Dynamic Strategy with Contextual Window Expansion Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model How Far Will They Go? Red-Teaming Online Influence with Large Language Models RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation Graph Alignment Topology as an Inductive Bias for Grounding Detection Can AI Guess What You Know? Performance Comparison of Large Language Models for Human Domain Knowledge Estimation From Communication Logs A Reproducible Universal Dependencies-Style Pipeline for Katharevousa Greek Parliamentary Text Memorization Dynamics of Fill-in-the-Middle Pretraining A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism Brain-LLM Alignment Tracks Training Data, Not Typology Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods Model Collapse as Cultural Evolution BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems What Training Data Teaches RL Memory Agents: An Empirical Study of Curriculum Effects in Memory-Augmented QA NeuroNL2LTL: A Neurosymbolic Framework for Natural Language Translation of Linear Temporal Logic RMA: an Agentic System for Research-Level Mathematical Problems DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems A Comparative Evaluation of Structural Topic Models and BERTopic for Short, Open-Ended Survey Responses ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization Mediative Fuzzy Logic: From Type-1 Foundations to Type-2, Type-3 and Quantum Extensions When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening EVE-Agent: Evidence-Verifiable Self-Evolving Agents Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems Self-Improving In-Context Learning Redrawing the AI Map: A Theory of Accountability Boundaries in Agentic Ecosystems Hidden Human-Like Nature of Machine-Generated Texts: Theory and Detection Enhancement AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery Foundation Protocol: A Coordination Layer for Agentic Society Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning GENSTRAT: Toward a Science of Strategic Reasoning in Large Language Models AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse Design and Report Benchmarks for Knowledge Work ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication Parallel Context Compaction for Long-Horizon LLM Agent Serving Emotion Recognition in Sign Language Conversation Ontological Knowledge Blocks: Executable Compliance and Profile-Based Validation for Trustworthy AI Systems GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation Cultural Adaptation in Large Language Models for Political Discourse DART: Semantic Recoverability for Structured Tool Agents Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision? From Correctness to Preference: A Framework for Personalized Agentic Reinforcement Learning Human-in-the-Loop Multi-Agent Ventilator Decision Support with Contextual Bandit Preference Learning Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations VideoOdyssey: A Benchmark for Ultra-Long-Context and Omni-Modal Video Understanding EquiSumm : A Gender Bias-Aware Framework for Inclusive Tweet Summarization Articulatory strategy as a source of variation in acoustic vowel dynamics GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction CP or DP? Why Not Both: A Case Study in the Partial Shop Scheduling Problem Naturalistic measure of social norms alignment CoMoGen: COntrollable MOtion Dynamics and Interactions with Mask-Guided Video GENeration Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction Scene Reconstruction as Mapping Priors for 3D Detection ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning The TIME Machine: On The Power of Motion for Efficient Perception One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents Asking For An Old Friend: Diagnosing and Mitigating Temporal Failure Modes in LLM-based Statutory Question Answering Millimeter-wave Imaging for Anthropometric Body Measurement MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection Structure-Guided Entity Resolution: Fine-Tuning LLMs for Robust Name Matching in Complex Linguistic Contexts Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering Agentic Proving for Program Verification Benchmarking Google Embeddings 2 against Open-Source Models for Multilingual Dense Retrieval and RAG Systems RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models Inconsistency-aware Multimodal Schrödinger Bridge for Deepfake Localization OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking ChartFI: Benchmarking Faithfulness and Insightfulness of Chart Descriptions from Multimodal Large Language Models An AI-Driven Framework for Energy-Efficient Environmental Monitoring in Smart Cities Using Edge Intelligence VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images
Analyzing the Effects of Two-Stage Peer Evaluation
Roy Fairstei · 2026-05-26 · via cs updates on arXiv.org

View PDF HTML (experimental)

Abstract:Peer-evaluation and selection systems are used when sets of agents evaluate each other in order to select the best $k$ among them. These are commonly used in real-world settings, including academic conferences where those reviewing papers are often the set of submitters. Conferences have attempted to better allocate their reviewing resources by moving to a two-stage mechanism, in which some papers are eliminated after a first stage of review and remaining papers receive additional reviewers. We investigate how two major strategyproof peer selection mechanisms, Partition and ExactDollarPartition, perform when adapted to a two-stage system, in order to try and understand the effect of the two-stage mechanism on which agents get selected. We also examine how the various parameters of the two-stage mechanism influence the outcome. We provide a theoretical basis by showing how a particular setting is influenced by the two stages. However, solving for the general case seems implausible at the moment, and we use extensive simulations of different scenarios and settings to observe which agents benefit and which are harmed by adopting two-stage mechanisms (and we vary this mechanisms parameters as well). We show that the two-stage mechanism's advantage depends the noisiness of reviewer beliefs. Borderline agents benefit most in a low noise environment, while high rank agents benefit more in noisy environments. We show that the effectiveness of these mechanisms is highly dependent on the number of chosen agents, the number of reviews requested from agents, and reviewers' correlation, indicating that organizers need to exercise caution when selecting these parameters for a reviewing process.
Subjects: Computer Science and Game Theory (cs.GT)
MSC classes: 91A80, 91B10, 91B12, 91B14
ACM classes: J.4; I.2
Cite as: arXiv:2605.24222 [cs.GT]
  (or arXiv:2605.24222v1 [cs.GT] for this version)
  https://doi.org/10.48550/arXiv.2605.24222

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Harper Lyon [view email]
[v1] Fri, 22 May 2026 21:09:10 UTC (908 KB)