惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs updates on arXiv.org

End-to-End Intracortical Speech Decoding from Neural Activity PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training Omissive Bias in Religious Representation: Benchmarking LLM Answers to Everyday Ethical Decision-making Word Class Representations Spontaneously Emerge from Successor Representations Trained on Natural Language Smoother Action Chunking Flow Policy via Prior-Corrected Orthogonal Trust-Region Guidance Learning to Reason Efficiently with A* Post-Training Phonetic Modeling of Dialectal Variation in Vietnamese Speech ChainLearn: A Blockchain-Based Capacity-Aware Framework for Federated Ensemble Learning GEESE: Genotype-aware End-to-End Spatio-temporal Embedding for Behavioral Phenotyping Structure-Aware RAG: Structured Retrieval Augmented Generation from Noisy Data for Conversational Agents SliceWorld: A Predictive and Controllable World-State Model for CT Report Generation Steering Beyond the Support: Adversarial Training on Unsupervised Jailbroken Activation Simulation Asymmetric Adaptation-based Real-time Fault Diagnosis Under Transitional Operating Conditions LAPLEX: The FFT of Learnable Laplace Kernels Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence Measuring the Depth of LLM Unlearning via Activation Patching Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes ChainzRule: Sample-Efficient, Robust Deep Learning Across Tabular, NLP, and Vision Tasks Zeroth-Order Nonconvex Nonsmooth Optimization with Heavy-Tailed Noise Representation-Guided Discrete Molecular Graph Retrosynthesis Interdomain Attention: Beyond Token-Level Key-Value Memory Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m A Reinforcement Learning Inspired Latent Yield Based Adaptive Algorithm Switching Mechanism EMA: Effort Metric Attention for Anatomical Effort-Guided Human Motion Diffusion AI-Driven Adaptive Adversaries and the Erosion of Cryptographic Trust in Public Key Systems Poisoning the Watchtower: Prompt Injection Attacks Against LLM-Augmented Security Operations Through Adversarial Log Content Hypothesis Generation and Inductive Inference in Children and Language Models Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions Discovering Lexical Gaps Using Embeddings from Multilingual LLMs CSP-Atlas: Concept-Specific Neural Circuits in a Sparse Python Transformer Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning A lift for input-convex neural network training WhenLoss: Diagnosing Write and Retrieval Bottlenecks in Long-Context Memory Systems Side-by-side Comparison Amplifies Dialect Bias in Language Models AvAtar: Learning to Align via Active Optimal Transport Assessing the Operational Viability of Foundation Models for Time Series Forecasting From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression Batch Normalization Amplifies Memorization and Privacy Risks MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding Lake Detection and Water Quality Estimation in Sentinel-2 Data LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots Evolving Robustness--Exploration Trade-off in Online Reinforcement Learning via Quantile Bayesian Risk MDPs How Much Structure Do LLMs Need? Evaluating LLMs for Bibliometric Cluster Description AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning Position: AI for Science Should Treat Measurement-to-Dataset Pipelines as Inference Components The Multilingual Curse at the Retrieval Layer: Evidence from Amharic Enhancing Reliability in LLM-Based Secure Code Generation Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models Beyond Fixed Points: Superpolynomial Capacity of Asymmetric Hopfield Networks SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver Vision-Guided Outdoor Flight and Obstacle Evasion via Reinforcement Learning The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching Φ-Noise: Training-Free Temporal Video Conditioning via Phase-Based Noise Manipulation DisDop: Distillation with Domain Priors for Open-Vocabulary Aerial Object Detection Summoning the Oracle to Slay It: Mitigating Look-Ahead Bias in Financial Backtesting with Large Language Models Market Regime Council for Dynamic Credit Assignment in Multi-Agent LLM Decision Systems Gaussian Rank-Based Neighborhood Degree for Graph Neural Networks in Image Classification An Effective-Rank Audit of Alignment-Induced Activation Shifts: Confound Control, Constructive Calibration, and Limits ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions Found in Conversation: LLMs Teach Themselves to Close the Multi-Turn Gap Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval Decompose-and-Refine: Structured Legal Question Answering with Parametric Retrieval SEAL: Synergistic Co-Evolution of Agents and Learning Environments DRInQ: Evaluating Conversational Implicature with Controlled Context Variation HiMed: Incentivizing Hindi Reasoning in Medical LLMs Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning AstroMind: A High-Fidelity Benchmark for Spacecraft Behavior Reasoning Based on Large Language Models Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models Fourier Feature Pyramids for Physics-Informed Neural Networks PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets Generating Legal Commentaries from Case Databases via Retrieval, Clustering, and Generation An Interactive Paradigm for Deep Research CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale Treatment Effect Estimation with Differentiated Networked Effect on Graph Data Generative OOD-regularized Model-based Policy Optimization Learning Laplacian Eigenspace with Mass-Aware Neural Operators on Point Clouds Refined Analysis of Entropy-Regularized Actor-Critic LLMs Show No Signs Of Individuated Metacognition CAffNet: Hard Constraint-Affine Neural Networks Momentum Streams for Optimizer-Inspired Transformers Synheart Capacity: A Theory-Driven Physiological Representation of Cognitive Capacity Dynamics from Wearable Signals The Normalized Maximum Likelihood for Regular Non-Smooth Models: Measure-Theoretic Foundations and Geometric Sampling What Are We Actually Decoding? Source Attribution for Non-Invasive Brain-to-Language Retrieval ECHO: Terminal Agents Learn World Models for Free Balancing Fairness, Privacy, and Accuracy: A Multitask Adversarial Framework for Centralized Data-Driven Systems A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation Private Adaptive Covariance Estimation via Gaussian Graphical Models Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Deep ZakaiJ: Structured Filtering for Jump-Diffusion Time Series Forecasting Rubato: Transcribing Piano Music with Timestamps Beyond Generative Priors: Minority Sampling with JEPA-Guided Diffusion RL with Learnable Textual Feedback: A Bilevel Approach Rethinking Federated Unlearning via the Lens of Memorization SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors
Resolving Ambiguity in Composed Image Retrieval via Calibrated Interaction
Amsisan Tran · 2026-05-26 · via cs updates on arXiv.org

View PDF HTML (experimental)

Abstract:Composed image retrieval (CIR) searches a corpus with a reference image and a text describing how to modify it. Despite rapid progress from triplet-trained compositors to zero-shot and generative methods, essentially all systems share one assumption: that a query maps to a single target, scored by Recall@K against one annotation. We argue this is fundamentally at odds with the task. A query such as make it more formal does not name an image but a region of the corpus, and which member the user intends is genuinely underdetermined. This underspecification is the root of the well-known false-negative problem and leaves current models unable to tell a precise query from an ambiguous one. We reframe CIR as calibrated intent resolution under uncertainty: a retriever is wrapped in a conformal prediction layer that returns a candidate set with a coverage guarantee and whose size is a principled measure of ambiguity; when the set is large, an expected-information-gain policy asks the single most useful clarifying question, drawn from interpretable ambiguity axes, and the set contracts. We introduce AmbiCIR, a benchmark and human-validated user simulator that revive the dormant auxiliary and dialogue annotations of CIRR and extend the multiple-positive setting of CIRCO. Across open-domain and fashion benchmarks our method matches single-turn state of the art, confirming calibrated resolution is cost-free on precise queries, while reaching the intended target in a fraction of the interaction budget required by naive conversational baselines, and it is the first to report valid coverage and calibration for the task.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2605.24634 [cs.CV]
  (or arXiv:2605.24634v1 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2605.24634

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Sui Yang Guang [view email]
[v1] Sat, 23 May 2026 15:49:16 UTC (4,142 KB)