惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

V
Vulnerabilities – Threatpost
U
Unit 42
F
Fortinet All Blogs
aimingoo的专栏
aimingoo的专栏
P
Proofpoint News Feed
F
Full Disclosure
月光博客
月光博客
Engineering at Meta
Engineering at Meta
博客园_首页
The Register - Security
The Register - Security
G
Google Developers Blog
The Cloudflare Blog
博客园 - Franky
K
Kaspersky official blog
A
Arctic Wolf
Scott Helme
Scott Helme
C
Cisco Blogs
Hugging Face - Blog
Hugging Face - Blog
C
Check Point Blog
NISL@THU
NISL@THU
AI
AI
D
DataBreaches.Net
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Stack Overflow Blog
Stack Overflow Blog
Project Zero
Project Zero
The GitHub Blog
The GitHub Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
量子位
Vercel News
Vercel News
T
Tor Project blog
P
Privacy International News Feed
D
Docker
I
Intezer
L
LangChain Blog
P
Proofpoint News Feed
Security Latest
Security Latest
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Threatpost
博客园 - 聂微东
AWS News Blog
AWS News Blog
Martin Fowler
Martin Fowler
P
Privacy & Cybersecurity Law Blog
V
V2EX
Last Week in AI
Last Week in AI
C
Cybersecurity and Infrastructure Security Agency CISA
The Hacker News
The Hacker News
T
Tenable Blog
Blog — PlanetScale
Blog — PlanetScale
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Tailwind CSS Blog

OnionTalk

Human Context Engineering Keep Pace Stay Focused Categories
Harness Engineering
hateonion · 2026-04-14 · via OnionTalk

I’ve seen a lot of posts recently — in the Chinese-speaking world and the English-speaking world — talking about harness engineering.

Most of them just cover the concept, with very little detail.

And honestly, some of the systems described are so complex I can’t believe they’d keep running without crashing.

So over the weekend, I built something myself.


It’s funny — I’ve built a lot of personal projects, but most of them die within 10 iterations. When I say dead, I mean they spiral: adding a feature causes bugs, fixing those bugs creates more bugs.

That’s exactly the spiral harness engineering is supposed to prevent.

People make it sound complicated. The way I built it is pretty dumb and simple — but it works. And I did learn a lot from the original post .


Rule 1: Make the worktree bootable.

This is critical if you want agents to work in parallel.

Use bun (or pnpm) for the frontend package manager — the cache mechanism means dependencies download in a flash. Use uv to manage the data pipeline environment and dependencies. Every worktree starts instantly, no pain.


Rule 2: Set the standard.

Treat it like you’re handling a ticket end-to-end. Lint, unit test, passing build — those are the basics. But for an app with UI, the critical one is UI testing.

Here I use agent browser and Playwright CLI . Both differ from traditional e2e tests, which go stale quickly when selectors change. With these tools, you pair with them for a round or two, describe how your app looks, and document it in a markdown file.

After that, you just say: “I expect if I click X, Y should pop up.”

All natural language.


Those two rules took most of my time. The rest — tech stack, infrastructure — I made those decisions myself. Agents write the code; architecture is still mine.

Then things started rolling.


I created a skill called decision-plan.

Give it a minimum requirement (including P0 use cases), and it will:

  • Explore the codebase
  • Create an ADR (Architecture Decision Record) based on the requirement
  • Generate an execution plan

The ADR covers high-level design, context, and the why. The execution plan covers the how — which files to change. The execution plan links back to the ADR. Both live inside the codebase as code.

I also created an index.md — a lightweight index so the agent can find the right ADR without scanning through all of them. As ADRs accumulate, they work as a knowledge base — and new ADRs automatically start referencing old ones when the skill is called.


Then the building phase.

I created another skill: build-and-ship. It runs:

  1. Create a worktree and make sure it’s bootable
  2. Read the execution plan
  3. Implement the plan
  4. Lint / unit test / build
  5. Bootstrap a dev server (random port) and run smoke tests (Playwright)
  6. Use agent browser or Playwright CLI to visually verify. Save screenshots and recordings.
  7. If everything passes, merge back to main. If anything fails, fix it.

My whole workflow: open 4 terminals, run decision-plan in parallel to generate ADRs and execution plans, clear context when done. Then kick off /build-and-ship ADR-0012. And it works. 48 ADRs later, still going.

Execution plans accumulating in the codebase


I also tried a simple Ralph loop — combining 4 ADRs into a PRD and letting it run autonomously. The loop itself ran fine, but the task splitting wasn’t ideal. Some validation points got missed, and bugs piled up.

I think the ADR/workflow structure isn’t far from supporting a fully autonomous solution. Just not quite there yet.


So what did I actually build with all this? InvestBuddy — import your broker data, track performance, export for tax. Everything stays in the browser.


What did I learn?

1. Good requirements + clear validation points are the best resource you can give an agent.

2. I’m no longer deep in the code details. I’m more like a PM.

I felt like I’d lost control at first. But I got used to it — and I’m actually happier watching things I built work for my own problems.

3. Simple is still better than complex.

The ADR system probably won’t scale to hundreds of ADRs — context limits will bite, agent memory will decay. But it works for this project, and that’s enough.

Build it simple. Make it work. When it breaks, fix it.