

















Abstract:Test-time compute is widely believed to benefit only large reasoning models. We show it also helps small embedding models. Since modern embedding models are distilled from LLM backbones, a frozen encoder should benefit from extra inference compute without retraining. An agentic program-search loop explores 144 candidate programs over a frozen encoder API and produces twelve Pareto-optimal programs spanning cost ratios from $c=1.2$ to $14.7$ over the single-pass baseline. The search independently rediscovers Rocchio pseudo-relevance feedback, ColBERT-style MaxSim at sentence granularity, reciprocal rank fusion, and the Fisher linear discriminant, all without trainable parameters or external models. Every frontier program improves nDCG@10 over the frozen baseline across all 14 MMTEB retrieval tasks spanning legal, financial, long-document, and general domains. The programs transfer without modification to unseen encoder families and nineteen held-out retrieval tasks, with 68% of model-task pairs admitting at least one frontier program that improves over the cosine baseline.
| Comments: | 16 pages, 4 figures |
| Subjects: | Machine Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR) |
| Cite as: | arXiv:2605.11374 [cs.LG] |
| (or arXiv:2605.11374v3 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.11374 arXiv-issued DOI via DataCite |
From: Han Xiao [view email]
[v1]
Tue, 12 May 2026 00:56:34 UTC (215 KB)
[v2]
Wed, 13 May 2026 00:56:03 UTC (126 KB)
[v3]
Tue, 26 May 2026 14:57:14 UTC (264 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。