惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

cs.CL updates on arXiv.org

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning MedicalBench: Evaluating Large Language Models Toward Improved Medical Concept Extraction Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs Cross-lingual robustness of LLM-brain alignment and its computational roots ACL-Verbatim: hallucination-free question answering for research Smarter edits? Post-editing with error highlights and translation suggestions Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization Refining and Reusing Annotation Guidelines for LLM Annotation NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval AgentAtlas: Beyond Outcome Leaderboards for LLM Agents Beyond Semantic Similarity: A Two-Phase Non-Parametric Retrieval Workflow for Corporate Credit Underwriting AFD-INSTRUCTION: A Comprehensive Antibody Instruction Dataset with Functional Annotations for LLM-Based Understanding and Design Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs Text Analytics Evaluation Framework: A Case Study on LLMs and Social Media LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws Terminal-World: Scaling Terminal-Agent Environments via Agent Skills DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU Bayesian Preference Learning for Test-Time Steerable Reward Models Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation Direct Translation between Sign Languages Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues LamPO: A Lambda Style Policy Optimization for Reasoning Language Models Reinforcing Human Behavior Simulation via Verbal Feedback Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents WCXB: A Multi-Type Web Content Extraction Benchmark GradeLegal: Automated Grading for German Legal Cases AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings Assessing socio-economic climate impacts from text data MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks Strategy-Induct: Task-Level Strategy Induction for Instruction Generation Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals Single-Pass, Depth-Selective Reading for Multi-Aspect Sentiment Analysis Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models DEL: Digit Entropy Loss for Numerical Learning of Large Language Models Enhancing Scientific Discourse: Machine Translation for the Scientific Domain LASH: Adaptive Semantic Hybridization for Black-Box Jailbreaking of Large Language Models Puzzled By ChatGPT? No more! A Jigsaw Puzzle to Promote AI Literacy and Awareness Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control Fine-grained Claim-level RAG Benchmark for Law DIVE: Embedding Compression via Self-Limiting Gradient Updates HRM-Text: Efficient Pretraining Beyond Scaling SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models Tracing the ongoing emergence of human-like reasoning in Large Language Models When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression Findings of the Counter Turing Test: AI-Generated Text Detection What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework Metaphors in Literary Post-Editing: Opening Pandora's Box? Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs Reliable Automated Triage in Spanish Clinical Notes: A Hybrid Framework for Risk-Aware HIV Suspicion Identification Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task "I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology Do LLMs Know What Luxembourgish Borrows? Probing Lexical Neology in Low-Resource Multilingual Models Towards Context-Invariant Safety Alignment for Large Language Models Interpretable Discriminative Text Representations via Agreement and Label Disentanglement Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs Building Arabic NLP from the Ground Up: Twenty Years of Lessons, Failures, and Open Problems Task-Routed Mixture-of-Experts with Cognitive Appraisal for Implicit Sentiment Analysis Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor PulseCol: Periodically Refreshed Column-Sparse Attention for Accelerating Diffusion Language Models DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Data Scaling as Progressive Coverage of a Predictive Contribution Spectrum Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning Training Language Agents to Learn from Experience On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists Multi-agent Collaboration with State Management Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study Gated Normalization Removal and Scale Anchoring in Pre-Norm Transformers Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy Building a Custom Taxonomy of AI Skills and Tasks from the Ground Up with Job Postings Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding Stage-Audit: Auditable Source-Frontier Discovery for Cross-Wiki Tables Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and Polarization JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media MemGym: a Long-Horizon Memory Environment for LLM Agents
VisRet: Visualization Improves Knowledge-Intensive Text-to-Image Retrieval
Di Wu, Yixin · 2026-04-17 · via cs.CL updates on arXiv.org

View PDF HTML (experimental)

Abstract:Text-to-image retrieval (T2I retrieval) remains challenging because cross-modal embeddings often behave as bags of concepts, underrepresenting structured visual relationships such as pose and viewpoint. We proposeVisualize-then-Retrieve (VisRet), a retrieval paradigm that mitigates this limitation of cross-modal similarity alignment. VisRet first projects textual queries into the image modality via T2I generation, then performs retrieval within the image modality to bypass the weaknesses of cross-modal retrievers in recognizing subtle visual-spatial features. Across four benchmarks (Visual-RAG, INQUIRE-Rerank, Microsoft COCO, and our new Visual-RAG-ME featuring multi-entity comparisons), VisRet substantially outperforms cross-modal similarity matching and baselines that recast T2I retrieval as text-to-text similarity matching, improving nDCG@30 by 0.125 on average with CLIP as the retriever and by 0.121 with E5-V. For downstream question answering, VisRet increases accuracy on Visual-RAG and Visual-RAG-ME by 3.8% and 15.7% in top-1 retrieval, and by 3.9% and 11.1% in top-10 retrieval. Ablation studies show compatibility with different T2I instruction LLMs, T2I generation models, and downstream LLMs. VisRet provides a simple yet effective perspective for advancing in text-image retrieval. Our code and the new benchmark are publicly available at this https URL.
Comments: ACL 2026 Camera Ready
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as: arXiv:2505.20291 [cs.CV]
  (or arXiv:2505.20291v4 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2505.20291

arXiv-issued DOI via DataCite

Submission history

From: Di Wu [view email]
[v1] Mon, 26 May 2025 17:59:33 UTC (28,254 KB)
[v2] Tue, 7 Oct 2025 07:50:24 UTC (32,498 KB)
[v3] Tue, 6 Jan 2026 18:46:16 UTC (34,685 KB)
[v4] Thu, 16 Apr 2026 05:19:04 UTC (34,692 KB)