慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
忆力停滞时:众团队多所忽略之末交互术
SapotaCorp · 2026-05-24 · via DEV Community

SapotaCorp

吾与之共事者,困于同困两月矣。其 RAG 取回之忆,仅居五十八。尝试 OpenAI 之嵌入三小,继而嵌入三大,复试 BGE-M3,终用 Voyage。每易一器,增得数点,然曲则平矣。今其队将始自调其嵌入之模。

吾告之止,且先增重排之。数自五十八而升八十一,仅一日下午耳。微调之役遂废。

此乃众队始觉瓶颈非在嵌入模型之时。实乃初择每段仅用一嵌入之架构所致。迟滞交互乃其家族之术,可解此困,然众队多避之,以其名似可怖也。

一段之嵌入所失

双编码器(凡标准嵌入模型皆如是),取文段,压缩为固定长度之向量而存之。及至查询,用户所问亦压缩为向量,二者间相似度乃算。

此压缩实为患也。五百词之段,言五异之理,乃平均为矢。矢略表其段,然失其辨矣。辨"此段多言X而略及Y"与"此段多言Y而略及X"之异。及用户询及Y,二段以余弦距度之,似皆相关,然一为正解,一为杂音也。

此乃诸"最佳嵌入模型"之测试,逾一定之境,其效渐减之故也。嵌入模型以单向量之信息瓶颈,已竭其所能。架构实乃其限也。

晚交互之道何在?

ColBERT(原版,2020年)不聚诸词之嵌入,而存其分。五百词之块,化五百向量;十词之问,成十向量。及乎计分之时,乃求每词之问与块中每词之极似,复总其极分,以成终局之相关分。

数理无殊,皆点积也,若向量检索。所异者,今"此问与此段契合若何",乃"于每问之词,求其最契之段词"之和也,故存细微之迹,而池化之所弃者得全。

实则若此。

  • "查询GPT-4o定价明细"分词后约得四词。
  • 每一符文皆寻其最契之候选段。
  • "GPT-4o" 与段中 "GPT-4o" 符文契合甚笃。
  • "pricing" 与段中 "cost" 或 "price" 或 "pricing" 契合。
  • "breakdown" 与 "table" 或 "structure" 或 "breakdown" 契合。
  • 汇总之最高分,得相关之数,映四查询词,非仅语义相似之平均。

此乃召回上限之所系也。

萨波塔(Sapota)因语料库之巨细与迟滞之预算,用ColBERT(Sapota)二法。

模式一:以ColBERT为双编码器检索之重排器。首重,双编码器向量检索,得五十候选。次重,ColBERT将五十重排,得五优。此乃吾等多数生产部署之范式。首重速(毫秒量级,可扩至亿万向量),次重缓,然仅重排五十候选,非全索引也。

模式二:独以ColBERT为检索器。若语料库之块数不过数百万,可藉PLAID或类此索引结构,使ColBERT为首要检索器,令迟交互检索于规模上可行。其迟滞较双编码器为高(视索引大小,或十倍至五十倍),然召回率乃吾等所测诸检索法中之最高者。

吾等常循式一,惟若语料甚微,式二可行,且召回之增益足证其效,则从之。

ColPali:文牍亦然。

ColPali将晚交互之理推及全页,视之为图。非但分词取嵌入,乃取页图之块嵌入(每页分作三十二乘三十二之格)。查词与图块,同用MaxSim之术。

其意若此:

  • 识字之工可废。模型视页,如目视言交之理。
  • 布局、图表、表格及图式,皆存于同一嵌入之域.
  • 跨模态查询(以文索图)乃天生之能。

其费在存储(每页1024向量,每块1向量)与索引之速(视觉编码器推理受GPU所限)。二进制量化使存储之费减32倍,延迟减一量级,此乃ColPali得以量产之由也。

文书繁重之语料库(如研究论文、财务申报、演示文稿、监管提交),ColPali于公开基准上,其效胜于双编码文本RAG及基于CLIP之多元模态RAG。吾等用之,当语料库确为视觉性,且预算可支存储与GPU推理之费。

费用之谈

后交不无价。诚然之权衡:

  • 存储之理。ColBERT之块,所储向量,较之双编码块,约多百倍(块中一符对块中一符)。ColPali每页储向量千二百四。宜于向量数据库之规画。
  • 索引之时。建索引费时,盖因向量愈多,故计算愈繁。非至灾,单GPU处理百万块之文,不过数时辰耳。
  • 查询之时。重排之式,增迟五十微秒至二百微秒于p50之迟,视其重排者所耗候选之多寡。纯ColBERT之设,则增迟尤甚。
  • 运作之繁。支持多向量与MaxSim比较之向量数据库,其数少于支持标准余弦搜索者。Qdrant则原生支持之。多数他者,则不然。

所当衡量者,乃召回之增益也。凡审计之中,Sapota所行之役,其团队之召回率滞于五十至七十之域,增以迟交互之重排,遂入八十至九十之域。此差之别,判乎“人工智能不可信”与“人工智能乃吾辈最佳之搜索界面”也。

何时不宜增以迟交互

重排者,非恒为答案。当弃之者:

  • 记忆已逾九成(汝已出有用之域;更进之效,源于提示或生成之功)。
  • 语料库之量甚微(不及五万段),故跨编码器重排器(BGE重排器、Jina重排器)得汝大半之增益,而需较简之设施。
  • 延迟预算严苛,必百毫秒以下(实时对话界面,重排器之冗余将损其体验)。
  • 此队无支持多向量MaxSim的向量数据库,迁跃非在计划之列。

大多数生产RAG系统,其召回率在六至七五之域,迟交乃次策。若队未备全ColBERT之设,交叉编码器重排者可为其轻选。

创始者何变?

此修整费半日之功。吾等于其既有之Qdrant检索与LLM调用间,增置一Jina-reranker阶段。召回率自58%跃升至81%。因LLM得见更佳之文境,忠实度亦自0.79升至0.93。而精调之项目,遂于当周中止。

次论者,当进交叉编码器为全ColBERT之设,据吾等观同质语料,可再增召回4至6分。其今之规模与预算,交叉编码器乃适之基。全ColBERT乃v2之阶。

若尔检索已至平夷

若汝之团队频换嵌入模型,而召回曲线渐趋平缓,其瓶颈必非模型,乃架构也。Sapota设一周重排序集成之约,增跨编码器或ColBERT阶段为工作分支,并并置评估于当前配置。

通乎其上人工智能工程之页 与汝所见之召回数字及已试之嵌入模型。其诊断,常为同此一谈。