惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
SegmentFault 最新的问题
Spread Privacy
Spread Privacy
Google DeepMind News
Google DeepMind News
WordPress大学
WordPress大学
Blog — PlanetScale
Blog — PlanetScale
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Apple Machine Learning Research
Apple Machine Learning Research
SecWiki News
SecWiki News
腾讯CDC
P
Privacy International News Feed
Webroot Blog
Webroot Blog
J
Java Code Geeks
爱范儿
爱范儿
A
About on SuperTechFans
S
Secure Thoughts
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
D
DataBreaches.Net
Cloudbric
Cloudbric
Security Archives - TechRepublic
Security Archives - TechRepublic
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
C
Cyber Attacks, Cyber Crime and Cyber Security
P
Proofpoint News Feed
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Security Latest
Security Latest
Forbes - Security
Forbes - Security
小众软件
小众软件
www.infosecurity-magazine.com
www.infosecurity-magazine.com
C
Cybersecurity and Infrastructure Security Agency CISA
T
Threatpost
量子位
MongoDB | Blog
MongoDB | Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
月光博客
月光博客
W
WeLiveSecurity
P
Privacy & Cybersecurity Law Blog
Vercel News
Vercel News
Google Online Security Blog
Google Online Security Blog
云风的 BLOG
云风的 BLOG
GbyAI
GbyAI
S
Security @ Cisco Blogs
T
The Exploit Database - CXSecurity.com
Help Net Security
Help Net Security
V
Visual Studio Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
Application and Cybersecurity Blog
Application and Cybersecurity Blog
博客园 - 聂微东
P
Proofpoint News Feed
C
CERT Recently Published Vulnerability Notes
Attack and Defense Labs
Attack and Defense Labs

Finisky Garden

The Hivemind of Language Models Theoretical Ceiling of Vector Retrieval Unexpected Perks of Talking to AI How Claude Dreams: Background Memory Defragmentation AI and Employment: A 200-Year-Old Debate Three Evolutions of Agent Engineering Context Management in Claude Code vs OpenClaw Foundation Models Plateau, Applications Take Off How OpenClaw Hit 350K Stars in 4 Months Deferred Tool Loading in Claude Code Why Claude Code's Edit Tool Doesn't Mangle Your Files Claude Code's Undercover Mode: When AI Learns to Hide Itself How Forked Sub-Agents Share Prompt Cache for 90% Savings Context Compaction in Claude Code: A Five-Layer Cascade and the Art of Free Summaries How Claude Code Defends Against Bash Injection
From RAG to Knowledge Compilation
finisky · 2026-04-16 · via Finisky Garden

RAG re-retrieves, re-assembles, and re-reasons on every query. Ask something that requires synthesizing five documents and the model has to find all five, stitch them together, and derive the answer from scratch. Ask ten times, retrieve ten times. Nothing accumulates.

Karpathy recently posted a gist called LLM Wiki proposing a different approach: instead of retrieving at query time, have the LLM pre-compile knowledge into a structured wiki and query the compiled result.

Link: LLM Wiki (Karpathy)

Where RAG Falls Short

To be clear, RAG’s operating model has a fundamental efficiency problem.

You load 200 papers into a RAG system. Ask “what are the research trends in this field over the past three years?” The model hits the vector database, pulls back the top-10 chunks, stuffs them into the prompt, and generates an answer. Sounds reasonable, but think about it: the information needed spans dozens of papers. Ten chunks won’t cover it. Even at k=50, the relationships between chunks — contradictions, evolution of ideas, converging threads — all have to be figured out in a single inference pass.

The next day you rephrase the question slightly. The model starts from zero again. Yesterday’s synthesis is gone.

NotebookLM, ChatGPT file uploads, most RAG frameworks — they all follow the same pattern: raw documents → retrieval → ad-hoc assembly → generation. Knowledge stays in the raw documents and never gets structurally organized. Interestingly, some products are already sidestepping this path: Claude Code’s retrieval uses no vector database or embedding index, relying on keyword search and file structure reasoning to pinpoint code in million-line codebases. At minimum, this shows vector retrieval isn’t the only game in town.

Add the theoretical ceiling of vector retrieval : single-vector models have a hard combinatorial limit on representable top-k combinations — at 100k documents with k=100, the theoretical lower bound already approaches the maximum dimension of 4096 used by current models. As queries get more complex, retrieval gets less reliable, and it’s not a model quality problem.

Compile, Don’t Interpret

Karpathy’s approach flips the paradigm: instead of interpreting at query time, compile at ingest time.

The programming language analogy works well. RAG is interpreted execution — re-parsing source code on every run. LLM Wiki is compiled execution — source code gets processed once, then you run the compiled artifact.

Three layers. The bottom layer is raw documents: papers, articles, meeting notes, podcast transcripts. Read-only. The middle layer is an LLM-generated wiki: summaries, entity pages, concept pages, comparisons, syntheses. The LLM owns this layer entirely. The top layer is a schema file (like CLAUDE.md) that defines the wiki’s structure, conventions, and workflows.

When you add a new document, the LLM doesn’t just store it for later retrieval. It reads the document, writes a summary page, then updates every relevant entity and concept page across the wiki — noting contradictions with existing claims, adding cross-references, revising summaries. A single document might trigger updates to 10-15 wiki pages.

Knowledge compiled once, continuously updated.

Querying the Compiled Wiki

After compilation, querying changes too. The model no longer digs through raw documents for chunks. It reads the wiki’s index file, finds relevant pages, and synthesizes from already-organized content.

The critical difference: cross-references are already built, contradictions already flagged, synthesis already done. The model doesn’t need to accomplish all of that in a single inference pass.

Karpathy’s own setup: an LLM agent on one side, Obsidian on the other, watching wiki pages update in real time. The LLM writes files, Obsidian renders them, graph view shows which concepts connect and which pages are orphans. His analogy: Obsidian is the IDE, the LLM is the programmer, the wiki is the codebase.

Query results can feed back into the wiki. Ask a comparison question, and the answer itself becomes a new wiki page. Every question you ask makes the knowledge base richer instead of vanishing into chat history.

The Maintenance Problem, Solved

Everyone knows knowledge bases are useful. Few people maintain them. Whether it’s a team wiki or personal notes, the decay follows the same pattern: maintenance cost grows faster than usage value. After 50 pages of notes, every new page means checking for contradictions, updating cross-references, and verifying that old conclusions still hold against new data. Nobody wants to do that work.

LLMs happen to be good at exactly this: they won’t forget to update a cross-reference, won’t mind touching 15 files in one pass, won’t abandon the project because maintenance is boring. Karpathy also suggests periodic linting — having the LLM audit the wiki for contradictions, orphan pages, missing concept entries, and data gaps that could be filled with a web search.

This is the key insight. Maintenance is the bottleneck in knowledge management, and LLMs reduce that cost to near zero.

Where It Fits

Karpathy lists several applications: personal knowledge management (journals, articles, podcast notes), research deep-dives (weeks or months of paper reading), book reading (chapter-by-chapter wiki with characters, themes, and plot threads), team knowledge bases (Slack threads, meeting transcripts, customer calls auto-organized).

The most compelling case is long-term research. Spend three months reading papers in a field, and you’ll forget the details of what you read in week one. RAG can help you locate a specific passage, but it can’t tell you how that passage relates to something from three weeks ago. The wiki can, because those relationships were built at ingest time.

The team use case is interesting too. Every team has the “we discussed this before, the conclusion is somewhere in a Slack channel” problem. An LLM continuously compiling those fragments into a structured wiki would reduce a lot of information loss.

Is RAG Dead Then?

Back to the opening question. Does LLM Wiki make vector retrieval RAG pointless?

No. They solve different problems.

RAG solves “quickly locate relevant fragments in a large document collection.” It works for one-off queries against large corpora that don’t need deep synthesis. You have 100k customer support conversations, a user asks about a specific product issue, RAG finds the relevant ones in milliseconds. You don’t need and can’t afford to pre-compile a wiki for that.

LLM Wiki solves “continuously accumulate and synthesize knowledge from a manageable document collection.” Document count is moderate (tens to hundreds), but inter-document relationships are complex and need long-term maintenance.

Put differently: RAG is a search engine, LLM Wiki is an encyclopedia. You wouldn’t organize 100k support tickets like an encyclopedia, and you wouldn’t do a three-month literature review with a search engine.

The RAG community is already moving in this direction. Microsoft’s GraphRAG builds a knowledge graph before retrieval — essentially a form of knowledge compilation. LLM Wiki goes further: the compiled artifact isn’t a graph but human-readable documents. Both share the same judgment: query-time retrieval alone isn’t enough; you need structural processing at ingest time.

The real takeaway may be that RAG shouldn’t be the end of the knowledge management pipeline. Many people dump documents into a vector database and call it done, but retrieval is just step one. For scenarios requiring deep understanding, knowledge compilation after retrieval is what matters.

The Rough Edges

Karpathy’s gist is honest about being an “idea file” — it describes the pattern, not an implementation. Running this in practice has some obvious rough edges.

Hallucination risk gets amplified in a wiki context. In RAG, a hallucination affects one answer; the next query re-retrieves and has a chance to self-correct. In a wiki, if an entity page contains an incorrect fact, every subsequent analysis referencing that page builds on the error. Mistakes compile into the knowledge base and compound over time. This is probably the strongest argument for the lint mechanism: not just finding orphan pages, but catching errors that have been baked in.

Scale is a concern. Karpathy says the index file works at moderate scale (~100 sources, hundreds of pages), but beyond that you need a search engine. Searching structured wiki pages is qualitatively different from searching raw document chunks though — wiki pages have titles, categories, and cross-references, so search precision and recall will be much better. The real risk is elsewhere: as wiki pages multiply, each LLM update needs to check more related pages, so per-ingest maintenance cost gradually climbs.

Cost isn’t trivial either. Each document ingestion might trigger updates across a dozen pages, each update an LLM call. The compilation cost for 100 documents is far more than vectorizing them for a RAG pipeline. But flip the perspective: that cost is paid once at ingest, saving repeated reasoning on every query. If your query frequency is much higher than your ingest frequency, compilation pays for itself.

An Old Idea, Newly Feasible

Karpathy closes with a reference to Vannevar Bush’s 1945 Memex: a private, curated knowledge store where links between documents are as valuable as the documents themselves. The problem Bush couldn’t solve was who does the maintenance. Eighty years later, LLMs fill that gap.

From Memex to wikis to Notion to RAG to LLM Wiki, the history of knowledge management is a repeated struggle with the same tension: storing is easy, organizing is hard. LLM Wiki isn’t the final answer, but it’s the first time “automatic organization” has been genuinely feasible. Whether the organization quality is good enough — that probably takes three months of use to find out.