惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
LangChain Blog
宝玉的分享
宝玉的分享
酷 壳 – CoolShell
酷 壳 – CoolShell
N
Netflix TechBlog - Medium
F
Fortinet All Blogs
T
Tailwind CSS Blog
Google DeepMind News
Google DeepMind News
Jina AI
Jina AI
J
Java Code Geeks
Recent Announcements
Recent Announcements
The Cloudflare Blog
D
DataBreaches.Net
Hugging Face - Blog
Hugging Face - Blog
WordPress大学
WordPress大学
Vercel News
Vercel News
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Microsoft Azure Blog
Microsoft Azure Blog
雷峰网
雷峰网
H
Help Net Security
博客园 - Franky
S
SegmentFault 最新的问题
T
The Blog of Author Tim Ferriss
博客园_首页
C
Check Point Blog
腾讯CDC
美团技术团队
Martin Fowler
Martin Fowler
The GitHub Blog
The GitHub Blog
M
MIT News - Artificial intelligence
Apple Machine Learning Research
Apple Machine Learning Research
P
Proofpoint News Feed
U
Unit 42
人人都是产品经理
人人都是产品经理
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Engineering at Meta
Engineering at Meta
M
Microsoft Research Blog - Microsoft Research
阮一峰的网络日志
阮一峰的网络日志
G
Google Developers Blog
Stack Overflow Blog
Stack Overflow Blog
B
Blog
Last Week in AI
Last Week in AI
博客园 - 三生石上(FineUI控件)
博客园 - 聂微东
云风的 BLOG
云风的 BLOG
H
Hackread – Cybersecurity News, Data Breaches, AI and More
李成银的技术随笔
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知

DEV Community

I Built a Local Gemma 4 Reviewer for Merchant Registry Evidence How to build your first MCP server in 10 minutes Expo SDK 56 Is Out, and a Few Things Finally Clicked Into Place Why Claude Code Sessions Diverge: A Mechanism Catalog When One AI Agent Is Not Enough: A Practical Delegation Pattern for Enterprise Systems Build a "Where to Watch" feature in 50 lines with the StreamWatchHub API Gemma 4 on Android: Tricks for Faster On-Device Inference Your AI agent has amnesia. You've just normalized it. 🚀 Reviving My Women Safety System – From Idea to Real-Time Smart Safety Solution I built an AI that reviews every PR automatically (because nobody was reviewing mine) 🌿 Git Mastery: The Complete Developer Guide Bringing Gemma 4 E2B to the Edge: Building a Privacy-First Dream Analyzer with Flutter & LiteRT Google I/O 2026 Wasn’t About Features — It Was About AI Becoming the Developer Environment Building an AI Vedic Astrology App in 25 Days — What Actually Worked (and What Didn't) Hermes Agent Has Four Memories — And That's Why It Doesn't Forget You Pressure Isn't Killing You -Your Relationship With It Is 🐳 How to Run Any Project in Docker: A Complete Guide AccessLens — a blind person's lanyard, powered by Gemma 4 on-device Glyph v0.2: the release is the joinery How I Built a Blazingly Fast, Privacy-First Batch Image Converter in the Browser Using OPFS and Web Workers Cómo solucionar \"Text content does not match server-rendered HTML\" en Next.js App Router FCoP 3.0: Why AI Agents Need a Track, Not a Brake Fibonacci: Quiz app which anyone can make revenue by viewing ads to the quiz contestants. The Subconscious Powered by Edge AI GPU Utilization Is Becoming the New Cloud Waste Crisis Cómo solucionar `docker run` con exit code 1 en Raspberry Pi JWT is a scam and your app doesn't need it 7 Agent Skill Packs That Actually Make AI Coders Better More Control, More Cost: Why Commanding AI Isn't Delegation SecureScan Synthadoc: We Built an AI Judge for Our AI Wiki Compiler - Here's What We Learned Cómo solucionar el error de permiso al ejecutar `pip.exe` en entorno virtual (Python 3.10 en Windows) Postgres-grade Serializable at 20k+ ops/s — on a laptop. Don’t try this at home. Pure Core, Imperative Shell in Rust with Stillwater Lean 4 for Programmers: Building a Todo List with Proof Trustless Bug Bounty Releases with a PoW-Gated DLC Oracle Building Autonomous DevOps Agents with MCP and LangChain Multimodal Gemma 4 Visual Regression & Patch Agent Git Time Machine — How Version Control Can Save Your Project My Dad Got an Electricity Bill He Couldn't Understand. Google I/O 2026 Just Made That Problem Solvable. My Dad Got an Electricity Bill He Couldn't Understand. Google I/O 2026 Just Made That Problem Solvable. Read Replicas Lie About Consistency. 4 Sync Modes Behind the Lie. Reviving My Coding Project with GitHub Copilot I Tried Gemini 3.5 Flash After Google I/O 2026 - Here is What I Found :)) Zero-Cost AI in VS Code Blueprints Might Be More Important Than Frameworks AI CareCompanion - Offline Health Assistant Long-Context Models Killed RAG. Except for the 6 Cases Where They Made It Worse. I Built a Neural Network Engine in C# That Runs in Your Browser - No ONNX Runtime, No JavaScript Bridge, No Native Binaries An In-Depth Overview of the Apache Iceberg 1.11.0 Release Your Agent Just Called the Same Tool 47 Times. Here's the 20-Line Detector. How I Built a Multi-System Astrology Bot in Python (And What Meta Banned Me For) Gemma 4 Has Four Variants. Here's How to Pick the Right One Before You Write a Single Line of Code. Log Level Strategies: Balancing Observability and Cost Why WebMCP Is the Most Important Thing Google Announced at I/O 2026 (And Nobody's Talking About It) Making LLM Calls Reliable: Retry, Semaphore, Cache, and Batch Google's 2x Energy Efficiency Claim Is Real — But Here's What They're Not Measuring What's actually going on with CORS, under the hood Language-Agnostic Code Generation: The Driver Plugin Model Why We Rewrote Our Python CLI in Go (and What We Gained) I added up everything Google gives developers for free after I/O 2026. It's kind of absurd The Dawn of Smarter Apps: My Take on Google I/O 2026 AI Announcements Why AI Agents Like Hermes Need a Semantic Execution Layer for the Physical World Why We Built TestSmith: The Test Coverage Problem Nobody Talks About How to Convert Bank Statement PDFs to Excel: The Complete 2026 Guide Have You Ever Used a Website That Keeps Working After You Turn Off Your Internet? From idea to indexed: how I launched a SaaS in 60 days with Laravel + React Building a local-first AI tutor for my daughter (and 10–14 year-olds in Austrian schools) with Gemma 4 EC2 SSH Not Connecting? Here Are the 5 Things That Were Wrong (And How I Fixed Them) Best AI Tools for HVAC Contractors 2026 From Closed Internal Stack to Open-Source Ecosystem: I Finally Shipped Three Years of .NET Infrastructure Scrumpan is offlically LIVE!! Building a BMI Calculator CLI with TypeScript — Types, Functions, and Vitest From Building WordPress Websites to Node.js APIs: My Honest Full Stack Journey XiHan Snore Coach: Privacy-First On-Device MedTech Guardian powered by Gemma 4 Mobile Why AI Coding Agents Hallucinate and How to Fix It mcp-probe v1.4.0: Contract assertions for production MCP servers Google I/O 2026 Wasn't About One More Model. It Was About the Agent Stack. How I built 100+ crypto calculators in 6 languages on Astro The Dawn of Local Multi-Agent Architectures: Why Gemma 4 Changes Everything for Cloud Developers # I Told My AI to Simulate a Planet for 10,000 Years. It Built the Whole Thing Itself. 18/30 Days System Design Questions! From Hackathon Chaos to Clean CLI: Reviving My Daily Routine Analyser with GitHub Copilot Building a Home Lab with Proxmox and Terraform (for Kubernetes) PolicyAware vs Guardrails vs AI Gateways vs Model Routers: The Comparison Every AI Engineer Needs to Read Partner: An AI That Does Research While You Sleep Rugby Fundamentals as Software Concepts - Mapping the Pitch to your Code Base I Let Claude Code Run Unsupervised for 24 Hours. Here's What Happened. Why Zed Is Replacing VS Code in My AI-Augmented Workflow Build a scroll-driven WebGL hero in 30 lines Karpathy's LLM Wiki? No Code with Claude or Github Copilot! Why Platform Governance and Transparency Matter for Developers and Freelancers I built a Flutter CLI that generates Clean Architecture in seconds Using an LLM to automate a task that used to take hours by hand CyberArena – Interactive Cyber Security Simulation & Threat Analysis Platform Tile Extractor Mathematical Functions in CSS: clamp, min, max and How They Simplify Responsiveness Polyglot Persistence in Microservices: Let the Domain Choose the Database 190 Countries, Zero API Calls: Shipping Static Data in a Chrome Extension Your AI Writes Code Fast. Here’s How to Check It Before Shipping
Compass v1.1.0 · we shipped a memory plugin that catches its own consumption drift
chunxiaoxx · 2026-05-24 · via DEV Community

Compass v1.1.0 · the recall consumption fix

We shipped nautilus-compass v1.1.0
12 hours after v1.0.0. v1.0.0 was the public stable cut. v1.1.0 fixes a
class of failure that v1.0.0 surfaces but does not catch · which we
caught in our own usage 5 hours after launch.

The bug we caught in production

A sister Claude Code dialog was supposed to publish a long-form article
to wechat using a 6-step quality pipeline (audit-gate, xhs-cards-embed,
specific account login flow). The pipeline was documented in cross-session
memory · a file called publisher_quality_pipeline_20260430.md.

Compass recall fired correctly · the file appeared in the agent's
UserPromptSubmit hook output:

🟢 [3h old] memory/publisher_quality_pipeline_20260430.md
       audit-gate / xhs-cards-embed / wxid · v6 必须先过 critic 6 维评分再发布

Enter fullscreen mode Exit fullscreen mode

The agent saw the title. Saw the 80-character description. Acted. It
did not Read the file body.
The actual rules — how to walk audit-gate,
which wxid, what xhs-cards-embed structure looks like — those rules
were in the body. None of them entered the agent's working context.

The agent then reproduced exactly the failure mode the file was written
to prevent: ad-hoc _tmp_publish_v8.cjs scripts, no critic round, wrong
login path.

The user's diagnosis was sharp:

compass 召回到了 · 我没消费 · 这是 agent 层的人格漂移 · 不是 compass 本身的失败

That's half right. Recall surfaced the right file. The agent failed to
consume. But the shape of the recall response made the failure easy
we returned title + 120-char description. Easy to skim. Easy to assume
you have read it when you have only read the index.

This is structural. Not the agent's fault.

The three-layer fix in v1.1.0

v0 · embed body in top-3 hits

Top-3 recall hits now embed the first 800 characters of post-frontmatter
body in an indented block:

🟢 score=0.84 · [3h old] memory/publisher_quality_pipeline_20260430.md
       audit-gate / xhs-cards-embed / wxid · v6 必须先过 critic 6 维评分
       │ # Publisher quality pipeline
       │
       │ Six-step pipeline mandatory before publishing to wechat:
       │ 1. audit-gate · V6 critic checks against 6 dimensions ...
       │ 2. xhs-cards-embed · embed cards into article body via ...
       │ 3. wxid login flow · use wxid `chunxiaox` not openid_of_first_follower
       │ ...
       │ … (+1273 more · Read publisher_quality_pipeline_20260430.md for rest)

Enter fullscreen mode Exit fullscreen mode

The agent now has the rules in its working context. No additional Read
tool call required. Tail hits 4..K stay header-only to keep the response
bounded (~3KB total).

v1 · embed past-mistake body in anti-anchor alerts

Compass's drift detector matches the current prompt against 35 negative
anchors learned from prior mistakes ("我猜应该是这样 · 反正用户不查",
"假装上次说定了的方案 · 用户应该忘了", ...).

Until v1.1.0 the alert just said: "matched anti-anchor X with cos=0.625".
Same problem as v0 — label visible, body invisible, agent shrugs.

v1.1.0 alerts now embed body from the most-relevant past lesson session.
Two-tier match: substring 6-gram against the anchor + lesson-type
frontmatter (Tier 1, precise) · falls back to recent drift!=green
sessions (Tier 2, the agent's own self-reported slip-ups). Every alert
becomes actionable, not decorative.

v2 · detect "recall fired but not consumed"

The most direct signal: did the agent actually open any of the files
recall surfaced?

recall_consumption.py (new module) walks back through the live session
jsonl file, finds N most-recent recall blocks, extracts memory file
paths, then checks subsequent assistant turns for matching Read tool
calls. If recall surfaced N paths and 0 got read, that is the failure
signature.

Wired into:

  • drift_check MCP tool result — runs even when the BGE daemon is unreachable, since the audit is pure file traversal
  • mid_session_hook every 25 tool calls — only nags when ≥3 unconsumed AND ratio < 0.3 (real signal, not noise)

Tested on a 130MB / 32k-line session: 41 recall hits surfaced, 0 consumed.
Smoking gun for "label != consumption" drift.

V7 v0.2 · the governance plan that scales without templates

v1.0.0 shipped a thin V7 governance layer with three tools:
governance_dispatch (fan-out router), governance_audit (cross-agent
fake-closure scanner), governance_lock_check (L0 hash lock for the
immutable core). 13 MCP tools total.

v0.1 dispatch worked but it was a fan-out router — given channels=
[dev.to, x, github]
it produced one bounty per channel via static dict
lookup. A user asked the right question:

千行百业有各种不同的任务类型永远不可能覆盖。

Right. Templates cannot cover the long tail of industries. The platform
side already solved this for publishing — channel adapters + anchor
pack registry — so adding a new channel or vertical = data change, not
code change.

v1.1.0 brings the same idea to decomposition. The new
governance_plan MCP tool reads two file-exported registries:

  1. _platform_registry/agents_capabilities.json — what each executor declares it can do (id, outputs, optional domains, optional anchor packs)
  2. _platform_registry/anchor_packs_phases.json — per-domain DAG of phases, each phase says requires_capability and depends_on

For each phase, V7 ranks executors by capability score (+10 capability
match, +5 domain match, +3 anchor pack match), picks the highest, emits
a queue file with depends_on_phase_ids so platform-side cron mints
bounties in the right order.

Verified on two domains:

  • marketing/dev-tools → 4 phases routed V5/V5/V5/Kairos
  • caishen-finance/audit → 5 phases · V6 wins for numeric-audit (V5 doesn't declare it · V5 takes write+publish)

Adding medical/literature-review next: 1 row in platform_anchor_packs

  • 1 row in platform_agents.metadata.capabilities[]. Zero V7 source change. Zero MCP tool surface change.

What stayed unchanged · the eval headlines

Eval numbers are still the v1.0.0 locked numbers from 2026-05-08:

Metric nautilus-compass best public baseline
LongMemEval-S (n=500) 56.6% Zep 55-60% (different judge)
EverMemBench-Dynamic Run 1 44.4% (n=500) MemOS 42.55
EverMemBench-Dynamic Run 2 47.3% (n=497)
Drift detector ROC AUC (held-out) 0.83
Reproduction cost $3.50 end-to-end $50+ for GPT-4o-judge stacks

v1.1.0 doesn't move the eval numbers. It moves the consumption
numbers — the ratio of recall hits whose body actually lands in the
agent's working context. We do not have a clean benchmark for that yet
(suggestions welcome) but in our own sessions it went from "skim the
title and proceed" to "rules-in-context by default."

Try it

pip install nautilus-compass==1.1.0
# or
npm install nautilus-compass@1.1.0

Enter fullscreen mode Exit fullscreen mode

Two papers on arxiv (drift detection + memory pipeline). 228 pytests
all green. MIT (anchors CC0).

Repo: github.com/chunxiaoxx/nautilus-compass

In-browser drift demo (no install): huggingface.co/spaces/chunxiaox/nautilus-compass

Postscript · what we believe

Recall != consumption · 看正文才算消费 · 不然命中等于零

Long-running agents drift. They forget rules they read three sessions
ago. They reproduce mistakes someone else already paid for. The fix is
not a smarter model · it is making the rules unmissably present in the
working context, then auditing whether they were actually consumed,
then making the audit cheap enough to run every 25 tool calls.

That is what v1.1.0 ships.