惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
爱范儿
爱范儿
WordPress大学
WordPress大学
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
罗磊的独立博客
S
SegmentFault 最新的问题
V
V2EX
V
Visual Studio Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
美团技术团队
博客园 - 三生石上(FineUI控件)
Stack Overflow Blog
Stack Overflow Blog
Y
Y Combinator Blog
MyScale Blog
MyScale Blog
D
Docker
Google DeepMind News
Google DeepMind News
Blog — PlanetScale
Blog — PlanetScale
M
Microsoft Research Blog - Microsoft Research
Martin Fowler
Martin Fowler
S
Secure Thoughts
B
Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
C
Cisco Blogs
C
CERT Recently Published Vulnerability Notes
T
True Tiger Recordings
GbyAI
GbyAI
P
Proofpoint News Feed
P
Privacy International News Feed
Jina AI
Jina AI
The Cloudflare Blog
I
Intezer
AWS News Blog
AWS News Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
S
Security Archives - TechRepublic
NISL@THU
NISL@THU
The Register - Security
The Register - Security
Recent Commits to openclaw:main
Recent Commits to openclaw:main
P
Palo Alto Networks Blog
S
Schneier on Security
L
LINUX DO - 热门话题
C
CXSECURITY Database RSS Feed - CXSecurity.com
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA

DEV Community

Sixteen TUI components, copy-paste, no dependency The Boring Reliability Layer Every Autonomous Agent Needs Nven - Secret manager Building Multi-Tenant Row-Level Security in PostgreSQL: A Production Pattern Building Vylo — Looking for Collaborators, Partners & Early Support I Thought Memory Fades With Time. It Actually Fades With Information. ORA-00064 오류 원인과 해결 방법 완벽 가이드 I registered an AI agent at 1 AM and something cracked open in my head Pitch: Nven - Sync secrets. Ship faster. Why y=mx+b is the heart of AI From Routines to a Crew — Building a System That Plans Its Own Work & executes it 25 React Interview Questions 2026 (With Answers) — Hooks, React 19, Concurrent Mode An open source LLM eval tool with two independent quality signals Using Dashboard Filtering to Get Customer Usage in Seconds from TBs of Data Skills, Java 17, And Theme Accents 4 Hard Lessons on Optimizing AI Coding Agents Arctype: Cross-Platform Database GUI for LLM Artifacts Your robots.txt says GPTBot is welcome. Your server says 403. Organizing How to Use AWS Glue Workflow 5 n8n Automations Every Digital Agency Should Be Running (Bill More, Work Less) Getting Started with TorchGeo — Remote Sensing with PyTorch Designing a Scalable Cross-Platform Appium Framework Google Antigravity 2.0 & Slash Commands Building a Unified Adaptive Learning Intelligence with Gemma 4, Flutter, and Multi-Model Orchestration Looking for beta testers for a £60 server management application The Disk-Pressure Incident That Taught Me to Always Set LimitRanges and Other Lessons from Mirroring EKS Locally. Why AI Should Not Write SQL Against ERP Databases Vibe coding works until it doesn't. The debt is real. Shipping at the Edge: Migrating a Coffee Subscription Platform to Cloudflare Workers Stop Tab-Switching: A Developer's Guide to Color Tools That Actually Fit the Workflow DevOps vs MLOps vs AIOps: What Changes, What Stays, and a Simple Roadmap to Get Started Run Powerful AI Coding Locally on a Normal Laptop 5 n8n Automations Every WooCommerce Store Needs (Save 10+ Hours/Week) What I Learned Building My Own AI Harness Hytale Servers Will Fail Treasure Hunts Until We Fix Our Event Handling Redux in React: Managing Global State Like a Pro Unfreezing Your GitHub Actions: Troubleshooting Stuck Deployments and Protecting Your Git Repo Statistics Unlocking Project Discoverability on GHES: A Key to Software Engineering Productivity When the Cleanup Code Becomes the Project Rockpack 8.0 - A React Scaffolder Built for the Age of AI-Assisted Development Mismanaging the Treasure Hunt Engine in Hytale Servers Will Get You Killed Why Hardcoded Automations Fail AI Agents Stop Calling It an AI Assistant. It’s Already Managing Your Company Why I built a post-quantum signing API (and why JWT is on borrowed time) Weekend Thought: Frontend Build Tools Suffer From Work Amnesia A 10-Line Playwright Trick That Saved Me Hours on Every Sephora Run AI Is Changing Engineering Culture More Than We Realize Everyone Was Focused on Gemini, But Infinite Scaler Was the Real Twister "Gemma 4 Analyzed My Bank Statements – Apparently I 'Have a Problem' with Coffee and Late-Night Apps" #css #webdev #beginners #codenewbie The Hidden Layer Every AI Developer Must Learn AlphaEvolve: Google DeepMind's Gemini-Powered Evolutionary Coding Agent RDS Reserved Instance Pricing: Every Engine, Every Rule, Real Dollar Savings How To Build An AI-Powered MVP Without Burning Your Startup Budget In 2026 Reading a Psychrometric Chart Without Getting Lost LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025) How to turn text into colors (without AI) Building Real-Time Apps in Node.js with Rivalis: WebSockets, Rooms, Actors, and a Binary Wire This Week In React #282 : Security, Fate, TanStack, Redux, Jotai | Hermes-node, Expo, Rozenite, Harness | TC39, Bun, pnpm, npm, Yarn, Node AI Copilot vs AI Agent Architecture - What's Actually Different (And Why It Matters) Smart Contract Security: NEAR's Futures Surge and AI Token Risks Database Maintenance: Tracing Production Incidents to Their Root Cause Stop juggling AI SDKs in PHP — meet Prisma Google Quietly Changed What “Apps” Mean at I/O 2026 The Infrastructure Team Is the Real Single Point of Failure Building SQLite from Scratch: 740 Lines of C++23 to Understand Every Byte of a .db File The 4 Levels of Hermes Agent Scaling Framework: From One Hermes Agent to a Fully Automated Team Your AI Has a Memory. It Just Doesn’t Know What to Remember. Claprec: Engineering Tradeoffs - Limited time vs. Perfection (6/6) Building a Daily Google News API Monitor in Python Building RookDuel Avikal: From Chess Steganography to Post-Quantum Archival Security Google I/O e IA: o que realmente muda na vida do dev? Color Contrast Failures: The Number One Accessibility Issue and How to Fix It # I Watched 15 Hours of Hermes Agent Videos So You Don't Have To Cómo solucionar el bucle infinito en useEffect con objetos y arrays en React The First Agent-Centric Cloud Security Platform — And Why We Didn't Build It That Way On Purpose Most Treasure Hunts Engines on Hytale Servers Are Built to Fail - Lessons from a Burned Database GhostScan v3.0 — From Closed-Source EXE to Open-Source Pentest Framework De hojas de cálculo a IA: construyendo una plataforma SRM moderna When is AI fine in education? Python Tools for Managing API Rate Limits in Data Pipelines How to Implement Exponential Backoff for Rate-Limited APIs in Python "My Web Chat Wasn't a Real Channel. That Broke My Agent Pipeline" next-advanced-sitemap v1.0.7 — safer URL ingestion & automatic trimming for Next.js sitemap generation I keep seeing people build an AI lead processing agent when they really need a 6-step rules engine AI Powered Student Learning Assistant Using Gemma 4 How I Built a Drop-In Proxy to Slash My OpenAI Bills by 20%+ Automatically Building a Sarcastic AI English Tutor with Persona-as-Code and Gemini Audio Input for Pronunciation Correction Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture Cutting LTX-2 22B Peak VRAM by 40% with fp8_cast — and Why optimum-quanto Was a Trap HiDream Skeleton Mode: Prompt Beats OpenPose Ref — 8 Patterns Benchmarked Replicating a Language-Learning Comedy Short with Claude Code — Gemini as a Multimodal Sub-Agent HiDream-O1-Image 3–8x Faster: Benchmarking Steps, CFG, and Resolution AWS Savings Plan Buying Strategy: How to Layer, Size, and Time Commitments application.properties I built a macro tracker powered by AI + attitude Solace: A Global Mental Health First Responder Built with Gemma 4 Why Blocking Prompt Injection Is Wrong — and What to Do Instead
Harness Engineering Still Needs Governance
Theo Valmis · 2026-05-18 · via DEV Community

Theo Valmis

The industry has moved from prompt engineering to harness engineering: execution systems that coordinate models, tools, memory, and retries across long-running agent loops. Harnesses solve how agents act. They do not solve whether the actions stay within architectural boundaries. As autonomous workflows scale, the missing layer is governance infrastructure.

The shift: prompts, harnesses, execution systems

Three years ago, the prevailing question was how to phrase a prompt. The model was the product, and prompt engineering was the surface where teams competed.

That framing no longer matches what is being built. OpenAI's recent harness work on Codex, Anthropic's pattern for long-running managed agents, Cursor's background-agent runtime, Claude Code's session-aware workflow, and the open-source frameworks — LangGraph, CrewAI, AutoGen — all converge on the same architectural move: the model is wrapped inside an execution system that handles tools, memory, retries, planning, and continuity.

Single calls have given way to execution loops. Assistants have given way to autonomous workflows. The model is no longer the product boundary. The execution environment is.

That is the right move. Anyone building agentic systems in 2026 needs a harness. But it is also incomplete. As soon as agents are doing real work across many sessions, on real codebases, the harness reveals what it does not cover: architectural intent.

What harness engineering solves

Harness engineering handles context lifecycle management, tool orchestration, retries and error recovery, planning and execution loops, memory injection and continuity, and observability and execution coordination.

This is a major architectural advancement over prompt engineering. Treating the execution environment as the product boundary lets autonomous workflows survive contact with real systems: rate limits, partial failures, long horizons, multi-tool coordination, and cross-session continuity.

None of what follows is an argument against any of that. The argument is that this layer alone is not enough.

The governance gap

A harness can make an agent faster, more persistent, more autonomous, and more capable. It cannot, by construction, make the agent architecturally aligned. Continuity is not constraint. Orchestration is not enforcement. Memory of past decisions is not authority to refuse a future one.

The failures look like this:

  • An ADR is bypassed. The repo has a recorded decision — "do not introduce a runtime ORM" — that the agent does not read at session start. The agent introduces an ORM because it solves the immediate ticket.
  • A forbidden dependency reappears. A package was removed for a documented reason. A later session reintroduces it because the prohibition lives only in a stale doc, not in an enforcement hook.
  • A governed system is rewritten. The agent refactors a module that had a specific layering contract. The new version is functionally equivalent and passes tests, but violates the layering rule that was the entire point of the original design.
  • Layering boundaries are crossed. A controller starts calling into a data layer that the architecture forbids it from touching directly.
  • Naming conventions drift. Each session is internally consistent. Across sessions, the naming gradually changes.
  • Infrastructure patterns mutate. A standard for how services are exposed, configured, or deployed is silently replaced by a sensible-looking alternative that the rest of the system does not expect.

None of these failures are caused by the harness being bad. They are caused by the harness being the wrong place to enforce architectural decisions. The harness's job is to keep work moving. Its incentives are continuity and throughput, not refusal.

Harnesses preserve execution continuity. They do not preserve architectural intent.

Why observability is insufficient

The most common response to the governance gap is to lean harder on observability. Trace every tool call. Log every diff. Pipe agent activity into a dashboard. If we can see what the agent did, we can correct it.

That argument confuses two different questions.

Observability answers what happened. Governance answers what should have been allowed. These are not the same problem.

  • Logs are not policy. A log records that a forbidden dependency was added. It does not refuse the add.
  • Traces are not invariants. A trace shows the call graph. It does not declare which call graphs are valid.
  • Visibility is not enforcement. A dashboard surfaces drift after it occurs. It does not block the change that produced the drift.

Observability is necessary — you cannot govern what you cannot see — but it sits on the wrong side of the action. By the time the trace reaches the dashboard, the commit has already happened. Governance has to sit in front of the action it constrains, with a deterministic rule about whether to allow it.

Governance propagation across execution surfaces

Long-running, autonomous agents do not only write source code. They write everywhere the workflow touches:

  • Branch names and PR titles — auto-generated by the harness, often outside the team's branch and title taxonomy.
  • Commit messages and tags — workflow-generated commits accumulate in history.
  • CI metadata and pipeline config — written by agents the same way they write code, but with stricter governance constraints.
  • Deployment artifacts and release notes — manifests, container tags, generated changelogs.
  • Generated configuration — feature flags, routing rules, scaling policies.
  • Agent-produced documentation — READMEs, ADR drafts, runbooks that become the next agent's training context.

Governance must propagate across every surface touched by autonomous execution. A governance layer that enforces ADR compliance in src/ but ignores commit messages, PR titles, CI config, and generated docs is governing a fraction of the agent's output.

The next layer: governance infrastructure

The clean way to think about the emerging stack is to stop treating it as a model-plus-tooling problem and start treating it as a layered system:

Models           — produce candidate output
  ↓
Harnesses        — coordinate execution, retries, tools
  ↓
Execution        — long-running loops, sessions, memory
  ↓
Governance       — defines and enforces architectural constraints
  ↓
Verification     — tests, builds, deploy-time checks

Enter fullscreen mode Exit fullscreen mode

Each layer answers a question the layer above it cannot:

  • Harnesses answer how does the agent act?
  • Execution systems answer how does the agent keep working across time?
  • Governance answers which actions are allowed, and according to which decisions?
  • Verification answers did the resulting system still pass its objective checks?

Governance is its own layer because the problem it solves is not solvable inside any of the others. Models produce text. Harnesses coordinate. Memory recalls. None of them can deterministically resolve which ADR governs a given change, or block output that violates the active decision graph.

This is what makes governance infrastructure a category, not a feature. It cannot be folded into the harness without giving the harness an enforcement responsibility that conflicts with its continuity responsibility. It cannot be folded into observability without losing its blocking authority.

Where Mneme fits

Mneme is a deliberately narrow layer in this stack. It does not orchestrate tools. It does not retry calls. It does not manage memory or context. It does one thing: it compiles the repository's ADR corpus into a deterministic decision graph and enforces it at the boundaries where agents make consequential changes.

  • Repo-native. ADRs live in the repository. The decision graph is rebuilt from them on every check.
  • Deterministic enforcement. Given the same decision graph and the same change, the result is the same every time.
  • Governance before generation. mneme check --mode warn at session start tells the agent which decisions are active before it writes anything. mneme check --mode strict at pre-commit and CI is the enforcement gate.

Harnesses help agents act. Governance ensures they act within architectural boundaries. Verification confirms the result. All three layers are infrastructure. None of them substitute for the others.

Long-running agents need more than memory and orchestration. They need enforceable architectural boundaries. The next phase of agent infrastructure is the layer that provides them.


Originally published at https://mnemehq.com/insights/harness-engineering-still-needs-governance/