惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

cs.CL updates on arXiv.org

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning MedicalBench: Evaluating Large Language Models Toward Improved Medical Concept Extraction Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs Cross-lingual robustness of LLM-brain alignment and its computational roots ACL-Verbatim: hallucination-free question answering for research Smarter edits? Post-editing with error highlights and translation suggestions Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization Refining and Reusing Annotation Guidelines for LLM Annotation NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval AgentAtlas: Beyond Outcome Leaderboards for LLM Agents Beyond Semantic Similarity: A Two-Phase Non-Parametric Retrieval Workflow for Corporate Credit Underwriting AFD-INSTRUCTION: A Comprehensive Antibody Instruction Dataset with Functional Annotations for LLM-Based Understanding and Design Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs Text Analytics Evaluation Framework: A Case Study on LLMs and Social Media LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws Terminal-World: Scaling Terminal-Agent Environments via Agent Skills DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU Bayesian Preference Learning for Test-Time Steerable Reward Models Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation Direct Translation between Sign Languages Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues LamPO: A Lambda Style Policy Optimization for Reasoning Language Models Reinforcing Human Behavior Simulation via Verbal Feedback Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents WCXB: A Multi-Type Web Content Extraction Benchmark GradeLegal: Automated Grading for German Legal Cases AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings Assessing socio-economic climate impacts from text data MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks Strategy-Induct: Task-Level Strategy Induction for Instruction Generation Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals Single-Pass, Depth-Selective Reading for Multi-Aspect Sentiment Analysis Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models DEL: Digit Entropy Loss for Numerical Learning of Large Language Models Enhancing Scientific Discourse: Machine Translation for the Scientific Domain LASH: Adaptive Semantic Hybridization for Black-Box Jailbreaking of Large Language Models Puzzled By ChatGPT? No more! A Jigsaw Puzzle to Promote AI Literacy and Awareness Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control Fine-grained Claim-level RAG Benchmark for Law DIVE: Embedding Compression via Self-Limiting Gradient Updates HRM-Text: Efficient Pretraining Beyond Scaling SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models Tracing the ongoing emergence of human-like reasoning in Large Language Models When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression Findings of the Counter Turing Test: AI-Generated Text Detection What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework Metaphors in Literary Post-Editing: Opening Pandora's Box? Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs Reliable Automated Triage in Spanish Clinical Notes: A Hybrid Framework for Risk-Aware HIV Suspicion Identification Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task "I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology Do LLMs Know What Luxembourgish Borrows? Probing Lexical Neology in Low-Resource Multilingual Models Towards Context-Invariant Safety Alignment for Large Language Models Interpretable Discriminative Text Representations via Agreement and Label Disentanglement Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs Building Arabic NLP from the Ground Up: Twenty Years of Lessons, Failures, and Open Problems Task-Routed Mixture-of-Experts with Cognitive Appraisal for Implicit Sentiment Analysis Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor PulseCol: Periodically Refreshed Column-Sparse Attention for Accelerating Diffusion Language Models DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Data Scaling as Progressive Coverage of a Predictive Contribution Spectrum Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning Training Language Agents to Learn from Experience On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists Multi-agent Collaboration with State Management Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study Gated Normalization Removal and Scale Anchoring in Pre-Norm Transformers Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy Building a Custom Taxonomy of AI Skills and Tasks from the Ground Up with Job Postings Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding Stage-Audit: Auditable Source-Frontier Discovery for Cross-Wiki Tables Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and Polarization JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media MemGym: a Long-Horizon Memory Environment for LLM Agents
RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models
Gabriele Mat · 2026-04-17 · via cs.CL updates on arXiv.org

View PDF HTML (experimental)

Abstract:Tool learning with foundation models aims to endow AI systems with the ability to invoke external resources -- such as APIs, computational utilities, and specialized models -- to solve complex tasks beyond the reach of standalone language generation. While recent advances in Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have expanded their reasoning and perception capabilities, existing tool-use methods are predominantly limited to text-only inputs and closed-world settings. Consequently, they struggle to interpret multimodal user instructions and cannot generalize to tools unseen during training. In this work, we introduce RaTA-Tool, a novel framework for open-world multimodal tool selection. Rather than learning direct mappings from user queries to fixed tool identifiers, our approach enables an MLLM to convert a multimodal query into a structured task description and subsequently retrieve the most appropriate tool by matching this representation against semantically rich, machine-readable tool descriptions. This retrieval-based formulation naturally supports extensibility to new tools without retraining. To further improve alignment between task descriptions and tool selection, we incorporate a preference-based optimization stage using Direct Preference Optimization (DPO). To support research in this setting, we also introduce the first dataset for open-world multimodal tool use, featuring standardized tool descriptions derived from Hugging Face model cards. Extensive experiments demonstrate that our approach significantly improves tool-selection performance, particularly in open-world, multimodal scenarios.
Comments: ICPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
Cite as: arXiv:2604.14951 [cs.CV]
  (or arXiv:2604.14951v1 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2604.14951

arXiv-issued DOI via DataCite

Submission history

From: Sara Sarto [view email]
[v1] Thu, 16 Apr 2026 12:47:09 UTC (1,864 KB)