
























Abstract:Multi-vector representations generated by late interaction models, such as ColBERT, enable superior retrieval quality compared to single-vector representations in information retrieval applications. In multi-vector retrieval systems, both queries and documents are encoded using one embedding per token, and similarity between queries and documents is measured by the MaxSim similarity measure. However, the improved quality of multi-vector retrieval comes at the expense of significantly increased search latency. In this work, we introduce LEMUR, a simple yet efficient framework for multi-vector similarity search. LEMUR consists of two consecutive problem reductions: First, we formulate multi-vector similarity search as a supervised learning problem that can be solved using a one-hidden-layer neural network. Second, we reduce inference under this model to single-vector similarity search in its latent space, enabling the use of existing single-vector search indexes to accelerate retrieval. LEMUR is an order of magnitude faster than prior multi-vector similarity search methods. Our code is available at this https URL
| Comments: | Accepted to ICML 2026 |
| Subjects: | Information Retrieval (cs.IR); Machine Learning (cs.LG) |
| Cite as: | arXiv:2601.21853 [cs.IR] |
| (or arXiv:2601.21853v2 [cs.IR] for this version) | |
| https://doi.org/10.48550/arXiv.2601.21853 arXiv-issued DOI via DataCite |
From: Elias Jääsaari [view email]
[v1]
Thu, 29 Jan 2026 15:26:32 UTC (3,108 KB)
[v2]
Thu, 21 May 2026 17:20:12 UTC (3,624 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。