惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

Avoid Cross Module Dependencies with Dependency Cruiser Invariant-Driven Architecture: 20M transactions on a €80/mo Cloud VM. Stop using external npm packages just to generate a UUID v4 Choosing the Right Gemma 4 Model Matters More Than Choosing the Best One From HTTPS to UCP: Shopping Is About to Stop Being Your Problem From Creation to Consumption: How Antigravity 2.0 and Gemini Spark Are Defining the Agentic Era 10 Mistakes I Wish I Knew Before Taking the CKA Exam AI That Actually Does Stuff: Autonomous Agents Explained Exploring AI workflow Orchestration: Comparing Weft, Python & Alternative Pipeline Approaches El Poder del Aprendizaje Federado: Cuando los Algoritmos Distribuidos Entrenan a la IA Email Marketing Automation in 2026: 5 Tools (and 1 Self-Hosted) Through Their APIs A Replay Runbook For Missed Publishing Windows Why timeout handling matters more than most backend logic How I Make $6,800/Month Selling Niche VS Code Extensions Model Routing Cost Checklist: Hosted APIs, Open Models, Or Self-Hosted Inference? ORA-00207 오류 원인과 해결 방법 완벽 가이드 Deno 2.8 Operator Upgrade Checklist: CI, Lockfiles, Node Compatibility, And Rollback AI-Discovered Vulnerabilities Need A Triage Queue, Not A Panic Channel AI Agent Workboards Need Audit Controls Before They Need More Agents Demystifying DevRel: What It Actually Is (And Why Should You Become One?) Your AI, Your Device, Your Data - Introducing Aide Gemma 4 GenAI Coach - GenAI Concepts Made Easy with an Interactive Playground QuietPulse - Mood Tracker Principal Components in TypeScript (Part 3) The pgAudit Attribution Gap: Why Role-Level Logging Fails GDPR and How to Close It Gemma 4 CAD Orchestrator I built a local Postgres triage co-pilot because HIPAA says I can't paste plans into ChatGPT or Claude Live Holographic Editor In Fractal Time Everbench: A document management system with Local Intelligence Instanton in Fractal Time The Hidden Features of Claude How I Built an AI News Brief with Next.js, Supabase, Vercel, and GPT-4o-mini How We Built a Multi-Agent AI Documentation System (And What We Learned) I got tired of writing post-mortems — so I built RCAi for SREs MIA: A Futuristic AI Desktop Assistant Built with Voice, Gestures, and Controlled Chaos Best Programming Language for Backend Web Development: PHP vs Python PayPal Alternatives for Indian Businesses: Best Payment Gateways for International Card Payments (2026) Gemma 4 Made Me Rethink Local AI: Not Just Text, But Images Too Clean Architecture in .NET Explained (The Dependency Rule) I Compiled Rust to WebAssembly and Made My JavaScript 6 Faster Outlook.com Is the Final Boss of 'Just Send an Email' Conditional Statements and Control Flow in Python Insults & Cutlasses, Local LLM Sword Fighting on Melee Island Production Lab: ECS Fargate + Prometheus + Grafana + Loki + Alloy + Node Exporter How 12 AI agent frameworks handle human approval (most badly) The Four-Index Reality: Why AI Search Isn't One Thing I Scanned 1 Million AI Services. Here's What Worries Me More Than the Vulnerabilities Managing multiple docker hub accounts using docker-use System Design Interview: Decentralized Web Crawler Metric Cardinality: High or Low? 4 Steps to Making the Right Choice 로컬 LLM 셋업 가이드 (v23) GEO vs SEO in 2026 — What Google's May Guidance Changed Cursor Review 2026 — Honest 'Not For Me' Take From a VSCode User Hello from rikuq — a practitioner blog for solo AI SaaS founders Why DevOps Engineers Need Practical Tutorials, Not Just Theory AI Agents in CI/CD: Give Them Context, Not Production Authority Now I See Why Translators Are Panicking Over AI—Should Coders Panic Too? Why I Track HRV Every Morning (And How It Actually Changes My Day) Diffusion Language Models: How NVIDIA's Nemotron-Labs DLM Is Killing Token-by-Token Generation Chatbots GPT pour le support client : ce que les équipes françaises ont réellement besoin de savoir I Hit the 1,232-Byte Wall So You Don't Have To Google Just Rebuilt the Search Box (Again) — But This Time It's Different Aether: A local Android assistant built with Gemma 4 BoxAgnts Introduction (1) — Out of the Box mkdev: trusted HTTPS for localhost, mapped by name Just one question, one answer. Why Java Still Rules the Programming World in 2026 Four Architectures for Letting Claude Edit Elementor (and Why We Shipped Clone-and-Mutate) yard-yaml 0.1.1: safer UTF-8 handling for YAML documentation I Built a Mac App That Keeps Your Clipboard in Sync Across All Your Android Devices Stop Using UUIDs: Why B2B SaaS Needs ULIDs in Laravel 🐘 I'm a non-technical founder who built a Slack approval tool. Here's what actually broke first. Open-Sourcing Our Game AI Stack — SDKs, Templates, and CLI Tools for NPC Dialogue I Built an AI System That Makes 1,000 Decisions a Day. Here's Where I Drew the Line. Lets Encrypt DNS Challenge with Traefik and AWS Route 53 Building an agent-ready website: how to make your site readable for ChatGPT, Perplexity and autonomous agents A productivity tool with GitHub as your cloud database How We Built Dynamic NPC Dialogue with LLMs — Lessons from Early Access cmux: The Native macOS Terminal Built for Running AI Coding Agents in Parallel Deep Atlantic Storage: Rewriting in Rust How I Built a Bulk Image Optimizer with $0 Server Costs Using Vanilla JS and Canvas API Humans and Machines read differently, I think I have a fix? Claude Code Deleted 92 Images Without Asking. This Happens More Than You Think. Method Calling Stack in Java I Built Schedule Sensei & Pushed It to GitHub – Here's What's Inside (And I Need Your Help 👀) OIC: From a Working Toast Watcher to a General "Watch It for Me" Agent Memory is two-thirds of what an AI chip costs to build The XState persistence problem is five years old. Here is what we built to finally solve it. i added MCP support to my SaaS in an afternoon. here's the whole thing. Framework: Link Building ☁️ Importing existing S3 buckets into Terraform state made easy with terraform import existing s3 bucket I Built a Token System on Solana (Without Any Backend Code) 터미널 AI 에이전트 구축 (v21) I Built an AI 3D Model Generator — Here's How I Handle Meshes in the Browser 🛡️ PromptGuard: I Built a Local AI Privacy Firewall That Sanitizes Your Prompts Before They Leave Your Machine PostgreSQL WAL Bloat: Why Automatic Management Is Often Insufficient? Seven PRs Before Lunch: Parallel Claude Code Tabs Plus Audit-Before-Bump Deployment using all three Kubernetes probes Qwen 3.6 Has Four Tiers. Here's How to Route Without Burning Cash. RAG 시스템 실전 구축 (v21)
Your LLM Is Not an Agent. Your Framework Is Not Enough. You Need a Harness.
Seenivasa Ra · 2026-05-25 · via DEV Community

Introduction

Every team building with AI agents hits the same wall. The demo works beautifully. The agent answers questions, calls tools, produces results. Then you ship it and the cracks appear it loses track of what it was doing, burns through API calls in circles, ignores boundaries it should respect, forgets context from five minutes ago. Users lose trust. Engineers lose sleep.

This is not a model problem. The LLM is capable. It's an infrastructure problem. The agent has a brain but no operating environment no structured loop to run in, no memory to draw on, no rules to constrain it, no way to resume where it left off. You gave it intelligence without giving it a way to apply that intelligence reliably.

That operating environment is called a Harness. And it's what separates a demo agent from one you'd actually trust in production.

What breaks without a harness

🔁 Infinite loops or premature stops. The agent has no governing loop it either runs forever or halts before the task is done.

🧠 Context amnesia. Long tasks overflow the context window. The agent loses the thread and starts hallucinating or repeating itself.

💾 No memory between sessions. Every conversation starts from zero. Multi-step, multi-day workflows are impossible.

🔧 Tool failures cascade. One flaky API brings the whole agent down because there's no error handling layer.

🚨 No guardrails. The agent touches system it should not.

You're Already Using the Pieces. A Harness Is How You Make Them Work Together.

If you've been building AI agents for a while, you know the drill. You pick a framework CrewAI, LangGraph, Strands, Microsoft Agent Framework and you start wiring things up. You add memory so the agent remembers things. You register tools so it can take actions. You configure guardrails so it doesn't go off the rails. You set up a loop so it keeps working until the task is done.

And it works. Mostly. In development, in demos, in controlled tests.

Then you put it in front of real users, with real tasks, over real time and you start seeing the cracks. The agent forgets things it shouldn't. It handles a task perfectly on Monday and fumbles the same task on Thursday. Two similar agents behave inconsistently. A tool fails and the whole run degrades silently. You added all the right pieces but somehow the whole is less than the sum of its parts.

This is the problem a harness solves. And here's the key thing to understand.

The core idea

A harness doesn't replace your framework. You're not choosing between them. Your framework gives you the ingredients memory, tools, loops, guardrails. The harness is the recipe the deliberate architectural decisions about how those ingredients are assembled, coordinated, and governed so your agent behaves consistently every single time.

Think of it like building a house. The framework is lumber, concrete, wiring, plumbing everything you need. T*he harness is the blueprint* and the construction process which material goes where, in what order, connected how, inspected by whom. Without a blueprint, you might still end up with a structure. But it probably won't hold up when the weather turns.

The PM & Developer Analogy

Here's a mental model that makes this concrete. In a software team, a Product Manager writes a story. It has context, a clear task, acceptance criteria, and scope boundaries. A Developer picks it up and delivers it. But the developer doesn't just start typing they follow a process. They use version control, a build system, coding standards, and a defined way to ask for help or escalate a blocker. That process is what makes delivery reliable, not just the developer's raw talent.

Now replace the developer with an AI Agent. The PM's story is the task prompt. The agent is the developer. The harness is the process the structured operating environment that governs how the agent reads the story, uses its tools, manages its memory, escalates when stuck, and knows when it's truly done.

The framework puts the tools in the developer's hands. The harness defines how the developer uses them consistently, safely, and with the right behavior for each situation.

Framework vs. Harness: Ingredient vs. Recipe

Here's where most explanations go wrong they imply frameworks are incomplete or that you shouldn't use them. That's backwards. Frameworks are excellent. They just operate at a different layer than a harness.

You can have every framework primitive in place and still have an unreliable agent because nobody made the architectural decisions about how they work together. That's the gap the harness fills.

The Decisions a Harness Makes

Every harness whether you've named it that or not is making below architectural decisions. Here's what each one actually means, and why it's a decision rather than just a feature you turn on.

The Thinking Loop Not just running, but knowing when to stop

Every framework gives you a loop. The harness decides the rules of that loop what counts as "done," how many iterations are too many, how to detect when the agent is stuck in circles, and when to break out and surface an error. Without these rules, your loop either exits too early or runs until your API bill catches fire.

Framework gives you: the loop mechanism. Harness decides: the exit conditions, the stuck-detection logic, the iteration limits.

The Working Memory Not just storing, but knowing what to keep

Context management

A context window is finite. As a task runs across many turns, old information competes with new information for that space. The harness makes the call: what gets summarized, what gets evicted, what always stays, and in what priority order. Without this policy, long tasks gradually degrade as the agent's window fills with stale or low-priority content.

Framework gives you: the context window. Harness decides: what lives in it at each point in the task lifecycle.

The Toolbox Not just available, but governed

Skills & Tools

Registering a tool in your framework makes it available. The harness decides which tools this specific agent, running this specific task, is actually allowed to use and what happens when a tool fails. Retry? Fall back to a different tool? Surface an error? Carry on? Each of these is a deliberate decision, and making them ad hoc leads to inconsistent behavior.

Framework gives you: tool registration. Harness decides: tool authorization, retry logic, fallback strategy, failure handling.

The Team Not just spawning, but coordinating

Sub-agents

Multi-agent frameworks let you spawn sub-agents. The harness defines how work gets divided, which sub-agent gets what, how their outputs are validated, and how the results are stitched back together. Without this, you end up with agents doing overlapping work, producing conflicting results, or silently dropping pieces of the task.

Framework gives you: sub-agent communication primitives. Harness decides: delegation strategy, output validation, result merging logic.

The Standard Library Capabilities every agent gets for free

Built-in skills
Some capabilities file reads, HTTP calls, date parsing, writing to memory are so universal that every agent needs them, and no agent should be writing boilerplate to get them. The harness bakes these in as defaults. Every agent inherits them, they behave consistently, and they're tested once rather than reimplemented per agent.

Framework gives you: the ability to add tools. Harness decides: which tools are universal defaults across every agent you build.

The Long-Term Memory Not just remembering, but knowing what's worth remembering

Session persistence
Frameworks give you a persistent store. The harness defines the policy around it what gets written to long-term memory, when, in what format, and how it gets retrieved and surfaced in future sessions. A poorly designed persistence policy is almost worse than none: your agent retrieves irrelevant old context and lets it pollute fresh tasks.

Framework gives you: the storage layer. Harness decides: write policy, retrieval strategy, relevance scoring, session restoration logic.

The Briefing Assembling the right instructions at the right moment

System prompt assembly

Most developers write a system prompt once and leave it static. But a static prompt is a blunt instrument. The harness assembles it dynamically at runtime composing the base instructions, the current task, the available tools, the relevant memory, and any user or role-specific context into one coherent briefing. Same agent, different context, different briefing. This alone is one of the biggest levers on agent quality.

Framework gives you: a system prompt field. Harness decides: what goes in it, dynamically, based on task and state.

The Audit Trail Every action, logged and explainable

Lifecycle hooks

Lifecycle hooks exist in most frameworks as extension points. The harness is the thing that actually wires them up into a coherent observability strategy logging every tool call, tracking cost per run, catching errors before they cascade, and giving you an answer to "what exactly did this agent do and why" for any given task. Without this wiring, you're flying blind.

Framework gives you: hook attachment points. Harness decides: what gets logged, measured, alerted on, and how errors propagate.

The Guardrails Not just checking, but enforcing consistently

Permissions & Safety
Frameworks give you input and output guardrail hooks. The harness defines the actual safety policy: which actions require human approval, what the agent is never allowed to do regardless of instructions, how prompt injection attempts are handled, and what happens when a guardrail fires. Guardrail hooks without a coherent policy are checkboxes without consequences.

Framework gives you: the validation hooks. Harness decides: the safety rules, authorization boundaries, and human-in-the-loop triggers.

You're not choosing between a framework and a harness. You need both. The framework is your team's toolkit. The harness is how your team actually works the process, the standards, the rules of the road that make the toolkit produce consistent results.

The bottom line

Every team building production AI agents is making harness decisions whether they call it that or not. Some make them deliberately, document them, and enforce them consistently. Others make them ad hoc, per agent, per developer and wonder why their agents behave differently across tasks, sessions, and users. The harness is just the name for doing it deliberately.

Thanks
Sreeni Ramadorai