惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Microsoft Security Blog
Microsoft Security Blog
Forbes - Security
Forbes - Security
月光博客
月光博客
WordPress大学
WordPress大学
Last Week in AI
Last Week in AI
罗磊的独立博客
V
Visual Studio Blog
Help Net Security
Help Net Security
宝玉的分享
宝玉的分享
H
Heimdal Security Blog
The Last Watchdog
The Last Watchdog
V
V2EX - 技术
S
SegmentFault 最新的问题
爱范儿
爱范儿
C
Check Point Blog
GbyAI
GbyAI
L
LINUX DO - 最新话题
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
W
WeLiveSecurity
Martin Fowler
Martin Fowler
Google Online Security Blog
Google Online Security Blog
F
Fortinet All Blogs
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Google DeepMind News
Google DeepMind News
aimingoo的专栏
aimingoo的专栏
H
Hacker News: Front Page
M
MIT News - Artificial intelligence
T
Threatpost
IT之家
IT之家
AI
AI
P
Privacy & Cybersecurity Law Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
美团技术团队
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Stack Overflow Blog
Stack Overflow Blog
博客园 - 叶小钗
云风的 BLOG
云风的 BLOG
The Hacker News
The Hacker News
N
News and Events Feed by Topic
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
大猫的无限游戏
大猫的无限游戏
C
CXSECURITY Database RSS Feed - CXSecurity.com
S
Security Archives - TechRepublic
T
The Blog of Author Tim Ferriss
Cloudbric
Cloudbric
博客园_首页
Hugging Face - Blog
Hugging Face - Blog
G
GRAHAM CLULEY
V
V2EX
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知

cs.AI updates on arXiv.org

Detecting Safety Violations Across Many Agent Traces C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts GenTac: Generative Modeling and Forecasting of Soccer Tactics ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Discourse Diversity in Multi-Turn Empathic Dialogue Evaluating Cooperation in LLM Social Groups through Elected Leadership SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context A Mamba-Based Multimodal Network for Multiscale Blast-Induced Rapid Structural Damage Assessment Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents A Triadic Suffix Tokenization Scheme for Numerical Reasoning Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment A collaborative agent with two lightweight synergistic models for autonomous crystal materials research Problem Reductions at Scale: Agentic Integration of Computationally Hard Problems Limited Perfect Monotonical Surrogates constructed using low-cost recursive linkage discovery with guaranteed output Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization Lectures on AI for Mathematics METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Quantization Dominates Rank Reduction for KV-Cache Compression Anthropogenic Regional Adaptation in Multimodal Vision-Language Model Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning From Agent Loops to Structured Graphs:A Scheduler-Theoretic Framework for LLM Agent Execution Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories The Missing Knowledge Layer in Cognitive Architectures for AI Agents CoRe-ECG: Advancing Self-Supervised Representation Learning for 12-Lead ECG via Contrastive and Reconstructive Synergy Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Using a Large Language Model RECIPER: A Dual-View Retrieval Pipeline for Procedure-Oriented Materials Question Answering Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method CocoaBench: Evaluating Unified Digital Agents in the Wild MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis Environmental Footprint of GenAI Research: Insights from the Moshi Foundation Model From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning Use of AI Tools: Guidelines to Maintain Academic Integrity in Computing Colleges Frugal Knowledge Graph Construction with Local LLMs: A Zero-Shot Pipeline, Self-Consistency and Wisdom of Artificial Crowds Efficient Training for Cross-lingual Speech Language Models Guardrails Beat Guidance: A Large-Scale Study of Rules, Skills, and Persistent Configuration for Coding Agents Towards Proactive Information Probing: Customer Service Chatbots Harvesting Value from Conversation Hodoscope: Unsupervised Monitoring for AI Misbehaviors PRISM Risk Signal Framework: Hierarchy-Based Red Lines for AI Behavioral Risk AI Integrity: A New Paradigm for Verifiable AI Governance Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities Intelligent Approval of Access Control Flow in Office Automation Systems via Relational Modeling Uncertainty-Aware Web-Conditioned Scientific Fact-Checking Introspective Diffusion Language Models Min-$k$ Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies When Verification Fails: How Compositionally Infeasible Claims Escape Rejection Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning Delving Aleatoric Uncertainty in Medical Image Segmentation via Vision Foundation Models RAG-KT: Cross-platform Explainable Knowledge Tracing with Multi-view Fusion Retrieval Generation A molecular clock for writing systems reveals the quantitative impact of imperial power on cultural evolution Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models QShield: Securing Neural Networks Against Adversarial Attacks using Quantum Circuits Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music ZoomR: Memory Efficient Reasoning through Multi-Granularity Key Value Retrieval DIB-OD: Preserving the Invariant Core for Robust Heterogeneous Graph Adaptation via Decoupled Information Bottleneck and Online Distillation AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis A Benchmark for Gap and Overlap Analysis as a Test of KG Task Readiness Task2vec Readiness: Diagnostics for Federated Learning from Pre-Training Embeddings Retinal Cyst Detection from Optical Coherence Tomography Images Resilient Write: A Six-Layer Durable Write Surface for LLM Coding Agents Speaking to No One: Ontological Dissonance and the Double Bind of Conversational AI MeloTune: On-Device Arousal Learning and Peer-to-Peer Mood Coupling for Proactive Music Curation Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series TInR: Exploring Tool-Internalized Reasoning in Large Language Models Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction Prosociality by Coupling, Not Mere Observation: Homeostatic Sharing in an Inspectable Recurrent Artificial Life Agent Generating Multiple-Choice Knowledge Questions with Interpretable Difficulty Estimation using Knowledge Graphs and Large Language Models Deep-Reporter: Deep Research for Grounded Multimodal Long-Form Generation When More Thinking Hurts: Overthinking in LLM Test-Time Compute Scaling Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting FedRio: Personalized Federated Social Bot Detection via Cooperative Reinforced Contrastive Adversarial Distillation Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents Principles Do Not Apply Themselves: A Hermeneutic Perspective on AI Alignment Learning and Enforcing Context-Sensitive Control for LLMs Preference-Agile Multi-Objective Optimization for Real-time Vehicle Dispatching Efficient Process Reward Modeling via Contrastive Mutual Information
AI Olympics challenge with Evolutionary Soft Actor Critic
Marco Calì, Alberto Sinigaglia, Niccolò Turcato, Ruggero Carli, · 2024-09-02 · via cs.AI updates on arXiv.org

In the following report, we describe the solution we propose for the AI Olympics competition held at IROS 2024. Our solution is based on a Model-free Deep Reinforcement Learning approach combined with an evolutionary strategy. We will briefly describe the algorithms that have been used and then provide details of the approach