惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

Hacker News - Newest: "LLM"

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-⁠Play If an LLM is too expensive it won't be next year "This paper is LLM reviewed" > "this paper is peer-reviewed" StepStone: LLM-Based GPU Kernel Driver Fuzzing via User-Space Libraries [pdf] GitHub - AssimilatedHuman/LLM-Inquisitor: Evaluating AI behaviour under real‑world work conditions to surface issues before they become problems. LLM INQUISITOR identifies failures (drift, instability etc) by observing AI during normal tasks — a tool the industry desperately needs to stem the 85% failure rate. Includes Quick Start, Practitioner’s Guide and Methodology. Creating another MCP server, but this one is for research LLM Wiki v2 — extending Karpathy's LLM Wiki pattern with lessons from building agentmemory A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents Sator Arepo - a Hugging Face Space by akolpakov Customizing an LLM for Enterprise Software Engineering Most AI agent papers stack one LLM with a vector store, we flipped it Evaluating job search ranking with LLM judged NDCG GitHub - quadracollision/llmisp: JSON AST > Clojure Parity Contracts for Polyglot LLM Commerce: A Case Study GitHub - ndom91/llama-dash: The operations layer for your local LLM stack Agentically optimizing LLM prompt cache TTLs for fun and profit Ask HN: What's your go-to LLM for coding? How do you reduce LLM spam in PR reviews? Ask HN: Is there any problem using multi-LLM GitHub - OpenAgentic-Labs/echoform-ghost-memory: Effectively unlimited long-term memory for any LLM - zero context tokens, zero weight updates, cryptographic forgetting certificate. PSA — Posture Sequence Analysis Why More Context Can Make an LLM Worse GitHub - robertoranon/tokoro: A toolbox for building event publish & discovery web sites, apps, feeds, and more GitHub - sermakarevich/chunker: Agentic approach to chunking a document A new EDIT tool for LLM agents LLMCap — Hard Dollar Caps on LLM API Calls MLSys @ WukLab - Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips What political censorship looks like inside an LLM's weights — a mechanistic-interpretability study of Qwen 3.5 Managing metadata is essential in LLM world Fixing LLM Writing with Distribution Fine Tuning twitter.com Show HN: An LLM that's better at writing The local shape of LLM stable regions GitHub - msunda17/impactarbiter-cli The Infrastructure Behind Making Local LLM Agents Useful PostgreSQL ext makes LLM available as an index for similarity searches,inference GitHub - Tetrahedroned/Agent-Braille: Deterministic 8-bit machine-to-machine protocol for AI agent state. ~92% fewer state-tracking tokens on real Claude Code sessions, a proven single-bit-error-safe command code, fully reproducible. Tell HN: Writing an LLM critique/takedown? – Do not use an LLM to write it 🌱 an LLM models our worst behavior Prompt eval cues predicted refusal shifts across 32k LLM rollouts Ask HN: Is Java the ideal language for LLM-assisted coding? AI Foundry – Flat-Fee Unlimited LLM Inference on Blackwell GPUs in NZ LLM tracing with MLflow AI Gateway LLM Performance by Programming Language The LLM Looked Smart. The Metrics Disagreed – tiago.rio.br The Four Horsemen of the LLM Apocalypse GitHub - piqoni/piqo-extension: A good interface is invisible Intro to TLA+ for the LLM Era: Prompt Your Way to Victory Give every tool LLM wiki and bypass Claude Code SSH Throttle The Ultimate LLM Fine-Tuning Guide Ask HN: What LLM models are you using and why? Five Agents, One Browser: Werewolf on Quack + DuckDB LLM models are not ready for orchestrating many agents ClickBook — Offline AI eReader - Apps on Google Play DeepSeek-V4-Flash means LLM steering is interesting again Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention Recent Developments in LLM Architectures: KV Sharing, MHC, Compressed Attention We Built SynapseKit: The Truth About Production LLM Frameworks GitHub - albedan/ai-ml-gpu-bench: A suite to benchmark CPU/GPU Python performance in training ML models and running local LLMs GitHub - chopratejas/headroom: Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server. if you are redlining the LLM, you aren't headlining Most Meaningful Dates on the Web and for an LLM I tested 8 LLM models on Linux without using the GPU RelaxAI – UK sovereign LLM inference at 80% cheaper than OpenAI/Claude GitHub - Andyyyy64/whichllm: Find the local LLM that actually runs — and performs best — on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly. GitHub - krellixlabs/llm-reasoning-research: Curated, annotated research on reasoning gaps in large language models — temporal reasoning, causal reasoning, and beyond. Agentic evals or LLM as a judge? considering cost, time and quality Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces Add an LLM policy for `rust-lang/rust` by jyn514 · Pull Request #1040 · rust-lang/rust-forge GitHub - nimeshnayaju/markdown-parser: A streaming-capable markdown parser, written in TypeScript Dragos Documents First LLM-Assisted Strike on Water Infrastructure in Mexico Alchemize: PyMC's model to replace Stan/PyMC, etc. with an LLM BlitzGraph - The AI-native backend. Pokémon SVG Bench LLM Witch Hunts are getting F'in Irritating bliki: Interrogatory LLM Ctx-opt: TypeScript middleware to trim LLM chats to a token budget Show HN: Local-first Kubernetes YAML visualizer (no server, no LLM) Why Ruby Is the Better Language for LLM-Powered Development Paper page - Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training Show HN: Asciidia – LLM-Powered Game State media control shapes LLM behaviour by influencing training data Small Model Forensics How LLM Inference Works Multi-LLM AI trading agent harness GitHub - crawshaw/yeah: yeah: LLM-powered yes/no CLI tool Predicting Rare LLM Failures with 30× Fewer Rollouts — LessWrong Mechanism Design for Quality-Preserving LLM Advertising I tried to put an on-device LLM in an iOS Share Extension. It didn't fit Show HN: Gox – Strict static analyzer for Go designed for LLM-written code GitHub - torrix-ai/install Show HN: MCPSafe – Free security scanner for MCP servers using 5-LLM consensus Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference Atlas Inference Engine Hi-Vis: one-shot jailbreak disguised as LLM "software patch" reaching 100% ASR Loading/running every LLM with 4M ctx in 3 clicks Free AI Leak Checker — Is Your Prompt Leaking Data? GLiGuard: 16x Faster Safety Moderation with a Small Language Model - Pioneer AI by Fastino Labs Are LLM Useful for Solo Founders
LLM Wiki v2 — extending Karpathy's LLM Wiki pattern with lessons from building agentmemory
2026-04-10 · via Hacker News - Newest: "LLM"

LLM Wiki v2 — extending Karpathy's LLM Wiki pattern with lessons from building agentmemory

A pattern for building personal knowledge bases using LLMs. Extended with lessons from building agentmemory, a persistent memory engine for AI coding agents.

This builds on Andrej Karpathy's original LLM Wiki idea file. Everything in the original still applies. This document adds what we learned running the pattern in production: what breaks at scale, what's missing, and what separates a wiki that stays useful from one that rots.

What the original gets right

The core insight is correct: stop re-deriving, start compiling. RAG retrieves and forgets. A wiki accumulates and compounds. The three-layer architecture (raw sources, wiki, schema) works. The operations (ingest, query, lint) cover the basics. If you haven't read the original, start there.

What follows is what we found after building and running this pattern across thousands of sessions.

The missing layer: memory lifecycle

The original treats all wiki content as equally valid forever. In practice, knowledge has a lifecycle. A bug you discovered last week matters more than one from six months ago. A pattern you've seen twelve times is more reliable than one you've seen once. A claim from a newer source should weaken an older one automatically.

Confidence scoring. Every fact in the wiki should carry a confidence score: how many sources support it, how recently it was confirmed, whether anything contradicts it. When the LLM writes "Project X uses Redis for caching," that claim should know it came from two sources, was last confirmed three weeks ago, and sits at confidence 0.85. Confidence decays with time and strengthens with reinforcement. This turns the wiki from a flat collection of equally-weighted claims into a living model where the LLM can say "I'm fairly sure about X but less sure about Y."

Supersession. When new information contradicts or updates an existing claim, the old claim shouldn't just sit there with a note. The new one should explicitly supersede it. Linked, timestamped, old version preserved but marked stale. Version control for knowledge, not just for files.

Forgetting. Not everything should live forever. A wiki that never forgets becomes noisy. Implement a retention curve: facts that were important once but haven't been accessed or reinforced in months should gradually fade. Not deleted, but deprioritized. The LLM equivalent of moving something to a bottom drawer. Ebbinghaus's forgetting curve works well here: retention decays exponentially with time, but each reinforcement (access, confirmation from a new source) resets the curve. Architecture decisions decay slowly. Transient bugs decay fast.

Consolidation tiers. Raw observations aren't the same as established facts. Build a pipeline:

  • Working memory: recent observations, not yet processed
  • Episodic memory: session summaries, compressed from raw observations
  • Semantic memory: cross-session facts, consolidated from episodes
  • Procedural memory: workflows and patterns, extracted from repeated semantics

Each tier is more compressed, more confident, and longer-lived than the one below it. The LLM promotes information up the tiers as evidence accumulates. This is how you go from "I saw this once" to "this is how things work."

Beyond flat pages: the knowledge graph

The original wiki is pages with wikilinks. That works, but you're leaving structure on the table. What you actually want is a typed knowledge graph layered on top of the pages.

Entity extraction. When the LLM ingests a source, it shouldn't just write prose. It should extract structured entities. People, projects, libraries, concepts, files, decisions. Each entity gets a type, attributes, and relationships to other entities. "React" is a library. "Auth migration" is a project. "Sarah" is a person who owns the auth migration and has opinions about React.

Typed relationships. Not all connections are equal. "uses," "depends on," "contradicts," "caused," "fixed," "supersedes" carry different semantic weight. A link that says "A relates to B" is less useful than "A caused B, confirmed by 3 sources, confidence 0.9."

Graph traversal for queries. When someone asks "what's the impact of upgrading Redis?", the LLM shouldn't just keyword-search. It should start at the Redis node, walk outward through "depends on" and "uses" edges, and find everything downstream. This catches connections that keyword search misses.

The graph doesn't replace the wiki pages. It augments them. Pages are for reading. The graph is for navigation and discovery.

Search that actually scales

The original relies on index.md, a single file cataloging every page. This works up to maybe 100-200 pages. Beyond that, the index itself becomes too long for the LLM to read in one pass, and you need real search.

Hybrid search. The best approach combines three streams:

  • BM25 (keyword matching with stemming and synonym expansion)
  • Vector search (semantic similarity via embeddings)
  • Graph traversal (entity-aware relationship walking)

Fuse the results with reciprocal rank fusion. Each stream catches things the others miss. BM25 finds exact terms. Vectors find semantic similarity. The graph finds structural connections. Together they beat any single approach.

Keep index.md as a human-readable catalog, but don't rely on it as the LLM's primary search mechanism past ~100 pages.

Automation: from manual to event-driven

The biggest practical gap in the original is that everything is manual. You drop a source and tell the LLM to process it. You remember to run lint periodically. You decide when to file an answer back.

In practice, you want hooks. Events that fire automatically:

  • On new source: auto-ingest, extract entities, update graph, update index
  • On session start: load relevant context from the wiki based on recent activity
  • On session end: compress the session into observations, file insights
  • On query: check if the answer is worth filing back (quality score > threshold)
  • On memory write: check for contradictions with existing knowledge, trigger supersession
  • On schedule: periodic lint, consolidation, retention decay

The human should still be in the loop for curation and direction. But the bookkeeping, the part that makes people abandon wikis, should be fully automated.

Quality and self-correction

Not all LLM-generated content is good. Without quality controls, the wiki accumulates noise.

Score everything. Every piece of content the LLM writes should get a quality score. Is it well-structured? Does it cite sources? Is it consistent with the rest of the wiki? You can have the LLM self-evaluate, or use a second pass with a different prompt. Content below a threshold gets flagged for review or rewritten.

Self-healing. The lint operation from the original should be more than a suggestion. It should automatically fix what it can. Orphan pages get linked or flagged. Stale claims get marked. Broken cross-references get repaired. The wiki should tend toward health on its own, not only when you remember to ask.

Contradiction resolution. The original mentions flagging contradictions. That's step one. Step two is resolving them. The LLM should propose which claim is more likely correct based on source recency, source authority, and the number of supporting observations. The human can override, but the default behavior should usually be right.

Multi-agent and collaboration

The original is single-user, single-agent. Many real use cases involve multiple agents or multiple people contributing to the same knowledge base.

Mesh sync. If multiple agents are working in parallel (different coding sessions, different research threads), their observations need to merge into a shared wiki. Last-write-wins works for most cases. For conflicts, timestamp-based resolution with manual override.

Shared vs. private. Some knowledge is personal (my preferences, my workflow). Some is shared (project architecture, team decisions). The wiki needs scoping. Private observations that roll up into shared knowledge when promoted.

Work coordination. When multiple agents work on the same knowledge base, they need lightweight coordination. Who's working on what. What's blocked. What's done. Not a full task management system, just enough to prevent duplicate work and track progress.

Privacy and governance

The original doesn't mention this, but it matters. Sources often contain sensitive information: API keys, credentials, private conversations, PII.

Filter on ingest. Before anything hits the wiki, strip sensitive data. API keys, tokens, passwords, anything marked private. This should be automatic, not something you remember to do.

Audit trail. Every operation on the wiki (ingest, edit, delete, query) should be logged with a timestamp, what changed, and why. This is your accountability layer. When something looks wrong in the wiki, the audit trail tells you how it got there.

Bulk operations with governance. As the wiki grows, you'll want to bulk-delete stale content, export subsets, or merge duplicate entities. These operations should be audited and reversible.

Crystallization: compounding from exploration

The original mentions that "good answers can be filed back into the wiki as new pages." This can be taken further.

Crystallization is the process of taking a completed chain of work (a research thread, a debugging session, an analysis) and automatically distilling it into a structured digest. What was the question? What did we find? What files/entities were involved? What lessons emerged? This digest becomes a first-class wiki page, and the lessons get extracted as standalone facts that strengthen the knowledge base.

Your explorations are a source, just like an article or a paper. The wiki should treat them that way. Ingest the results, update the graph, strengthen or challenge existing claims.

Output formats beyond markdown

The original mentions Marp for slide decks and matplotlib for charts. The wiki's output shouldn't be limited to markdown pages. Depending on the query, the right output might be:

  • A comparison table
  • A timeline visualization
  • A dependency graph
  • A slide deck for presenting findings
  • A structured data export (JSON, CSV) for further analysis
  • A brief for someone else on your team

The wiki is the knowledge store. The output format depends on the audience and the question.

The schema is the real product

The original implies this but it's worth being direct: the schema document (CLAUDE.md, AGENTS.md) is the most important file in the system. It's what turns a generic LLM into a disciplined knowledge worker. It encodes:

  • What types of entities and relationships exist in your domain
  • How to ingest different kinds of sources
  • When to create a new page vs. update an existing one
  • What quality standards to apply
  • How to handle contradictions
  • What the consolidation schedule looks like
  • What's private vs. shared

You and the LLM co-evolve this document over time. The first version will be rough. After a few dozen sources and a few lint passes, you'll have a schema that reflects how your domain actually works. That schema is transferable. Share it with someone else working on a similar domain and they get a running start.

Implementation spectrum

All of this is modular. You don't need everything on day one.

Minimal viable wiki: raw sources + wiki pages + index.md + a schema that describes ingest/query/lint workflows. This is roughly what the original describes. It works. Start here.

Add lifecycle: confidence scoring, supersession, basic retention decay. This prevents the wiki from becoming a junk drawer.

Add structure: entity extraction, typed relationships, knowledge graph. This makes queries better and surfaces connections you'd miss with flat pages.

Add automation: hooks for auto-ingest, auto-lint, context injection. This is where the maintenance burden drops to near zero.

Add scale: hybrid search, consolidation tiers, quality scoring. This is what you need when the wiki grows past a few hundred pages.

Add collaboration: mesh sync, shared/private scoping, work coordination. This is for teams or multi-agent setups.

Pick your entry point based on your needs. The pattern works at every level.

Why this matters

Karpathy's original insight stands: the bottleneck is bookkeeping, and LLMs eliminate that bottleneck. What we've added is the machinery that keeps the wiki healthy as it scales. Lifecycle management so knowledge doesn't rot. Structure so connections aren't lost. Automation so humans stay focused on thinking rather than filing. Quality controls so the wiki earns trust over time.

The Memex is finally buildable. Not because we have better documents or better search, but because we have librarians that actually do the work.


This document extends Andrej Karpathy's LLM Wiki with patterns proven in agentmemory, a persistent memory engine for AI agents built on iii-engine. The original idea file is the foundation; this adds what we learned building the engine.