惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
Tailwind CSS Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
S
SegmentFault 最新的问题
U
Unit 42
C
Cyber Attacks, Cyber Crime and Cyber Security
Security Latest
Security Latest
L
LINUX DO - 最新话题
The Register - Security
The Register - Security
人人都是产品经理
人人都是产品经理
美团技术团队
PCI Perspectives
PCI Perspectives
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
W
WeLiveSecurity
F
Full Disclosure
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Cloudbric
Cloudbric
L
LangChain Blog
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
M
MIT News - Artificial intelligence
S
Security @ Cisco Blogs
博客园 - 【当耐特】
Webroot Blog
Webroot Blog
Stack Overflow Blog
Stack Overflow Blog
C
Check Point Blog
Help Net Security
Help Net Security
NISL@THU
NISL@THU
WordPress大学
WordPress大学
Simon Willison's Weblog
Simon Willison's Weblog
月光博客
月光博客
C
CERT Recently Published Vulnerability Notes
博客园 - 三生石上(FineUI控件)
S
Securelist
博客园 - Franky
博客园 - 叶小钗
AWS News Blog
AWS News Blog
D
DataBreaches.Net
P
Proofpoint News Feed
小众软件
小众软件
C
Cybersecurity and Infrastructure Security Agency CISA
Hugging Face - Blog
Hugging Face - Blog
Engineering at Meta
Engineering at Meta
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
H
Hackread – Cybersecurity News, Data Breaches, AI and More
The GitHub Blog
The GitHub Blog
K
Kaspersky official blog
Vercel News
Vercel News
Google Online Security Blog
Google Online Security Blog
C
Cisco Blogs
S
Security Affairs

cs.AI updates on arXiv.org

Detecting Safety Violations Across Many Agent Traces C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Discourse Diversity in Multi-Turn Empathic Dialogue Evaluating Cooperation in LLM Social Groups through Elected Leadership SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents A Triadic Suffix Tokenization Scheme for Numerical Reasoning Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Quantization Dominates Rank Reduction for KV-Cache Compression Anthropogenic Regional Adaptation in Multimodal Vision-Language Model Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping RECIPER: A Dual-View Retrieval Pipeline for Procedure-Oriented Materials Question Answering Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method CocoaBench: Evaluating Unified Digital Agents in the Wild MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis Use of AI Tools: Guidelines to Maintain Academic Integrity in Computing Colleges Efficient Training for Cross-lingual Speech Language Models Guardrails Beat Guidance: A Large-Scale Study of Rules, Skills, and Persistent Configuration for Coding Agents Towards Proactive Information Probing: Customer Service Chatbots Harvesting Value from Conversation Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities Uncertainty-Aware Web-Conditioned Scientific Fact-Checking Min-$k$ Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies When Verification Fails: How Compositionally Infeasible Claims Escape Rejection Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning A molecular clock for writing systems reveals the quantitative impact of imperial power on cultural evolution Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music ZoomR: Memory Efficient Reasoning through Multi-Granularity Key Value Retrieval AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis Speaking to No One: Ontological Dissonance and the Double Bind of Conversational AI Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series TInR: Exploring Tool-Internalized Reasoning in Large Language Models Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction Generating Multiple-Choice Knowledge Questions with Interpretable Difficulty Estimation using Knowledge Graphs and Large Language Models Deep-Reporter: Deep Research for Grounded Multimodal Long-Form Generation Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents Learning and Enforcing Context-Sensitive Control for LLMs Efficient Process Reward Modeling via Contrastive Mutual Information Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment NSFL: A Post-Training Neuro-Symbolic Fuzzy Logic Framework for Boolean Operators in Neural Embeddings Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models LLMs Should Incorporate Explicit Mechanisms for Human Empathy AI Patents in the United States and China: Measurement, Organization, and Knowledge Flows ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation A Progressive Training Strategy for Vision-Language Models to Counteract Spatio-Temporal Hallucinations in Embodied Reasoning Cooperation in Human and Machine Agents: Promise Theory Considerations CHAIRO: Contextual Hierarchical Analogical Induction and Reasoning Optimization for LLMs Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs PEMANT: Persona-Enriched Multi-Agent Negotiation for Travel From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation VeriSim: A Configurable Framework for Evaluating Medical AI Under Realistic Patient Noise CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning CWCD: Category-Wise Contrastive Decoding for Structured Medical Report Generation TrajOnco: a multi-agent framework for temporal reasoning over longitudinal EHR for multi-cancer early detection Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents VeriTrans: Fine-Tuned LLM-Assisted NL-to-PL Translation via a Deterministic Neuro-Symbolic Pipeline Zero-shot World Models Are Developmentally Efficient Learners From GPT-3 to GPT-5: Mapping their capabilities, scope, limitations, and consequences Gypscie: A Cross-Platform AI Artifact Management System TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale AI Organizations are More Effective but Less Aligned than Individual Agents Dead Cognitions: A Census of Misattributed Insights STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems The Amazing Agent Race: Strong Tool Users, Weak Navigators A Dual-Positive Monotone Parameterization for Multi-Segment Bids and a Validity Assessment Framework for Reinforcement Learning Agent-based Simulation of Electricity Markets SVSR: A Self-Verification and Self-Rectification Paradigm for Multimodal Reasoning Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models Edu-MMBias: A Three-Tier Multimodal Benchmark for Auditing Social Bias in Vision-Language Models under Educational Contexts Credit-Budgeted ICPC-Style Coding: When Agents Must Pay for Every Decision PoreDiT: A Scalable Generative Model for Large-Scale Digital Rock Reconstruction MAVEN-T: Reinforced Heterogeneous Distillation for Real-Time Multi-Agent Trajectory Prediction
Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field Prediction
Changan Chen, Jordi Ramos, Anshul Tomar, Kristen Grauman · 2024-05-05 · via cs.AI updates on arXiv.org

Sim2real transfer has received increasing attention lately due to the success of learning robotic tasks in simulation end-to-end. While there has been a lot of progress in transferring vision-based navigation policies, the existing sim2real strategy for audio-visual navigation performs data augmentation empirically without measuring the acoustic gap. The sound differs from light in that it spans across much wider frequencies and thus requires a different solution for sim2real. We propose the first treatment of sim2real for audio-visual navigation by disentangling it into acoustic field prediction (AFP) and waypoint navigation. We first validate our design choice in the SoundSpaces simulator and show improvement on the Continuous AudioGoal navigation benchmark. We then collect real-world data to measure the spectral difference between the simulation and the real world by training AFP models that only take a specific frequency subband as input. We further propose a frequency-adaptive strategy that intelligently selects the best frequency band for prediction based on both the measured spectral difference and the energy distribution of the received audio, which improves the performance on the real data. Lastly, we build a real robot platform and show that the transferred policy can successfully navigate to sounding objects. This work demonstrates the potential of building intelligent agents that can see, hear, and act entirely from simulation, and transferring them to the real world.