惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs.AI updates on arXiv.org

Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs Mixture of Complementary Agents for Robust LLM Ensemble Exploration of Perceptual Speech Features for Clinical Decision-Support in Mental Health Care Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform Treatment Effect Estimation with Differentiated Networked Effect on Graph Data Lattice theory and algebraic models for deep convolutional learning based on mathematical morphology Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood? Multimodal Alignment and Preference Optimization for Zero-Shot Conditional RNA Generation LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs Second Guess: Detecting Uncertainty Through Abstention and Answer Stability in Small Language Models Machine Psychometrics: A Mathematical Psychology of Artificial Intelligence Assessing the Operational Viability of Foundation Models for Time Series Forecasting ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale Diff-Instruct with Diffused Reward: Towards Principled One-step Generator RL DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning An Interpretable CF-RL-TOPSIS Fusion Model for Skills-Aware Talent Recommendation Not All Transitions Matter: Evidence from PPO MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning AvalancheBench: Evaluating Enterprise Data Agents Through Latent World Recovery Market Regime Council for Dynamic Credit Assignment in Multi-Agent LLM Decision Systems TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models Momentum Streams for Optimizer-Inspired Transformers Catching The Correct Answer Trap: Characterising AI Tutor Blind Spots When Analysing Student Reasoning Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers Generative OOD-regularized Model-based Policy Optimization In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models Raon-Speech Technical Report MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games High-Risk AI Systems and the Problem of Identity in the European AI Act Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization Enhancing Reliability in LLM-Based Secure Code Generation Nano World Models: A Minimalist Implementation of Future Video Prediction Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection Balancing Fairness, Privacy, and Accuracy: A Multitask Adversarial Framework for Centralized Data-Driven Systems Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions Φ-Noise: Training-Free Temporal Video Conditioning via Phase-Based Noise Manipulation Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers HeartBeatAI: An Interpretable and Robust Deep Learning Framework for Multi-Label ECG Arrhythmia Detection Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models Extracting Training Data from Diffusion Language Models via Infilling QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning An Interactive Paradigm for Deep Research Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Side-by-side Comparison Amplifies Dialect Bias in Language Models Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition Learning to Reason Efficiently with A* Post-Training Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference Hypothesis Generation and Inductive Inference in Children and Language Models Fundamental Limitation in Explaining AI Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists Hidden-State Privacy Has an Empty Middle Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving Inference Time Context Sparsity: Illusion or Opportunity? Summoning the Oracle to Slay It: Mitigating Look-Ahead Bias in Financial Backtesting with Large Language Models Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods Feature Lottery? A Bifurcation Theory of Concept Emergence Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild Batch Normalization Amplifies Memorization and Privacy Risks Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs Clustering as Reasoning: A $k$-Means Interpretation of Chain-of-Thought Graph Learning Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation Distributionally Robust Transfer Learning with Structurally Missing Covariates, with Application to Cross-National Cardiac Arrest Prediction Privacy-Preserving Local Language Models for Longitudinal Data Retrieval in Chronic Dermatologic Disease: Implementation in Pemphigus Patients Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration Human-AI Collaboration in Science at Scale: A Global Large-scale Randomized Field Experiment Confidence Calibration in Large Language Models
Mode-as-Sequence: Translating Multimodal Motion Prediction into Unified Sequential Mode Modeling
Zikang Zhou, · 2026-05-26 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:Multimodal motion forecasting is inherently under-supervised: each training scene provides only one realized future, yet multiple plausible futures exist. This sparse supervision often leads to mode collapse (redundant hypotheses and insufficient mode coverage) and unreliable confidence ranking when predicting a small set of trajectories. We propose Mode-as-Sequence, a unified decoding framework that translates an unordered mode set into an ordered mode sequence and explicitly models mode-to-mode dependency. Under this framework, we develop two complementary instantiations. ModeSeq performs recurrent mode decoding, where each mode is generated conditioned on the previously generated modes, encouraging diverse, non-redundant hypotheses with calibrated confidence ordering. To remove the mode-by-mode autoregressive bottleneck, we further propose Parallel ModeSeq, which preserves the same causal dependency using masked mode-to-mode self-attention while decoding all modes in a single forward pass, enabling efficient large-$K$ inference and scalable joint-scene prediction. To learn representative modes and calibrated confidence under sparse labels, we introduce Early-Match-Take-All (EMTA) and its joint-scene extension MA-EMTA, together with a lightweight ranking regularizer that reduces confidence inversions. Extensive experiments on large-scale benchmarks demonstrate consistent improvements in both ranking-oriented metrics and best-of-K accuracy across datasets, horizons, and object types. In the Waymo Open Dataset challenges, ModeSeq achieves 1st place in the 2024 LiDAR-free motion prediction track, and Parallel ModeSeq achieves 1st place in the 2025 Interaction Prediction Challenge, validating the effectiveness of Mode-as-Sequence for both accuracy and efficiency.
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2605.24037 [cs.CV]
  (or arXiv:2605.24037v1 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2605.24037

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Haibo Hu [view email]
[v1] Thu, 21 May 2026 11:37:17 UTC (2,618 KB)