惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

U
Unit 42
S
Securelist
小众软件
小众软件
WordPress大学
WordPress大学
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
The GitHub Blog
The GitHub Blog
Apple Machine Learning Research
Apple Machine Learning Research
博客园 - 司徒正美
博客园 - Franky
Hugging Face - Blog
Hugging Face - Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
酷 壳 – CoolShell
酷 壳 – CoolShell
O
OpenAI News
Cloudbric
Cloudbric
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
TaoSecurity Blog
TaoSecurity Blog
MongoDB | Blog
MongoDB | Blog
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
V
V2EX
PCI Perspectives
PCI Perspectives
T
Troy Hunt's Blog
Schneier on Security
Schneier on Security
P
Palo Alto Networks Blog
M
MIT News - Artificial intelligence
V2EX - 技术
V2EX - 技术
阮一峰的网络日志
阮一峰的网络日志
Hacker News - Newest:
Hacker News - Newest: "LLM"
G
Google Developers Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
The Last Watchdog
The Last Watchdog
The Register - Security
The Register - Security
腾讯CDC
N
News and Events Feed by Topic
C
Check Point Blog
爱范儿
爱范儿
T
Tailwind CSS Blog
Webroot Blog
Webroot Blog
P
Proofpoint News Feed
S
Schneier on Security
MyScale Blog
MyScale Blog
N
News | PayPal Newsroom
Recorded Future
Recorded Future
T
Tenable Blog
I
InfoQ
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Microsoft Security Blog
Microsoft Security Blog
Simon Willison's Weblog
Simon Willison's Weblog
Engineering at Meta
Engineering at Meta

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
How to align coding agents with your plans better than markdown, without burning tokens
Mixture of E · 2026-05-15 · via DEV Community

The expensive moments in a coding-agent session are not the model's tokens. They are the seconds you spend skimming a markdown plan and missing a subtle misalignment. You approve, then watch the implementer solve a slightly different problem than the one in your head.

We have started treating that gap as a UI problem, not a model problem. And the UI we have, for coding agents specifically, is bad.

Thariq Shihipar at Claude Code has been making this case publicly for a while: agents should be emitting HTML, not markdown, for most non-trivial output. His thread is the right primer on why, and we're not going to try to re-derive it here. What we want to add is the piece that has been missing for us. We needed a way to use HTML at every plan stage without the token cost stacking up across the session. That way is a screenshot, borrowed from how DeepSeek-OCR handles context compression.

Thariq's article.

The case Thariq makes, in three parts. We will not reproduce Thariq's thread in full. We suggest reading it. The arguments worth restating here are the ones the rest of this post leans on:

  1. Markdown won by inertia. It rendered everywhere, was easy for a human to hand-edit, and the kinds of plans agents used to produce were short. None of that still binds. Most people are no longer hand-editing agent-generated specs, they are prompting the agent to edit them. Plans have grown into full RFCs. And every modern reviewer has a browser tab open.

  2. HTML carries information markdown cannot. Tables with real column alignment, SVG diagrams drawn to scale, before/after panels rendered side by side at the same visual weight. In the absence of those, agents fall back to ASCII boxes and unicode block characters approximating colors. That fallback is what most markdown plans actually look like at length, and it is why nobody reads past line 100.

  3. Information density matters most at the plan stage. This is where the gap between what the agent thinks you want and what you actually want is widest. Forcing the plan through a flat-text encoding is a lossy compression step you do not need to be performing.

Thariq catalogs the use cases: plan stages with branching options, design and prototype reviews, PR walkthroughs, code and architecture explainers, throwaway custom editors that end with a "copy as JSON" button. We have ended up using HTML for all of those. Our experience matches his closely enough that the right move is to point you at his thread rather than re-list them.

Where this landed for us: design work with a coding agent

The plan-stage argument is the one that converted us, and design work is where it shows up most starkly.

The last time we were iterating on a UI change with Claude Code, we asked for the plan as a single-file HTML artifact instead of the usual markdown. Two columns, BEFORE on the left, AFTER on the right, rendered with the real tokens and chrome the UI actually ships.

The point is not the specific feature. The point is that one artifact got us to high-fidelity comprehension in a single round trip. The markdown equivalent would have been a paragraph of prose and a bullet list. Readable, but lossy in exactly the ways that matter for a visual change. Getting to the same level of confidence through markdown would have taken three or four back-and-forth turns of "what does this look like next to X" and "show me the spacing," each one re-tokenizing the conversation and giving us a worse mental picture than the rendered comparison did instantly.

The expensive operation is reading the spec and noticing what the agent got wrong. Spending model tokens on rendered HTML pays for itself the first time it replaces three turns of "what does this look like next to X" with one look.

Where Thariq's argument gets harder: token cost on long sessions

HTML is not free. A single artifact comparing two design approaches with inline styles, SVG, and full content runs roughly four to six times the tokens of the equivalent markdown plan. Generation also takes two to four times longer. On a one-shot artifact that's fine. On a long coding-agent session, the plan gets re-read by the implementer, then the reviewer, then the follow-up planner. The HTML keeps getting re-tokenized into context, and the cost stacks up across the session.

This is the part Thariq's posts don't fully address, and it's why HTML stayed a sometimes-tool for us instead of a default. The fix came from a different research direction.

DeepSeek-OCR is the missing mechanism

DeepSeek-AI's paper DeepSeek-OCR: Contexts Optical Compression makes a simple claim: a page of text rendered as an image and processed by a vision encoder can be encoded into far fewer tokens than the same text processed as text. Their model card lists the encoding modes. A 1024x1024 image of a full page becomes 256 vision tokens. Their Tiny mode does it in 64. For content that has visual structure, the image channel encodes more per token than the text channel by a wide margin.

Paper: https://arxiv.org/abs/2510.18234
Model card: https://github.com/deepseek-ai/DeepSeek-OCR

You do not need to run their model to borrow the mechanism. Once you have an HTML artifact you are happy with, you do not need to keep the HTML itself in context for subsequent agent calls. Render it, screenshot it, feed the PNG back as an image. The vision tokens encode the same spec at a fraction of the text-token cost, and the human-readable HTML is preserved on disk for the next time you need to iterate.

The workflow we have settled into:

  1. Agent generates the HTML artifact as part of the plan stage.
  2. We open it in a browser, review, edit if needed, approve.
  3. A small wrapper renders the artifact and captures a PNG.
  4. Subsequent agent calls receive the PNG as part of the spec, not the raw HTML.

The trade is asymmetric. Our review happens against the rendered HTML, where spacing, alignment, and color do the work of catching the misalignments. The model's re-reads across the implementer and reviewer stages happen against the screenshot, which costs a fraction of the text tokens. Iteration cost stays close to a markdown plan. What we can see in one glance goes way up.

This is what moved HTML artifacts from "nice when I remember to ask for one" to "default at every plan stage" for us.

Why coding-agent TUIs have not shipped this yet

Claude chat ships artifacts. ChatGPT canvas ships canvas. The chat side of the ecosystem worked this out a while ago: prose-only loses information at exactly the moments that matter most.

The coding-agent TUIs (Claude Code, Codex, Opencode, etc.) are still markdown-first across every stage of the loop. Part of the reason is that TUIs render in terminals, and terminals do not render HTML. But the artifact does not need to live inside the TUI. A hook that drops the file in a browser tab or a side panel solves the rendering problem. The harder constraint is that the agent has to know when an HTML artifact is the right tool, and most plan-stage prompts never ask for one. The default is markdown, the path of least resistance is markdown, and you find out about the misalignment after the implementer is halfway done.

In the short term the fix is one line in your plan-stage prompt: ask for a single-file HTML artifact when the problem is comparison-heavy, visual, or architecturally branching. Then add the screenshot step before the artifact gets re-read by downstream agent calls. In the longer term we want the agents to reach for HTML on their own, the way Claude already does in chat.

Try it on the next ambiguous plan

The pattern is cheap to try in one session. The next time an agent hands you a markdown plan for something you would want to compare, draw, or render, ask for a single-file HTML artifact instead. Open it in a browser. Read the rendered comparison rather than the prose abstraction of it. If the HTML changes your read on the plan, that is what markdown was hiding.

Then screenshot it before the next agent stage reads it back. The screenshot is what makes this the default at every plan stage, instead of a tool you only reach for when the artifact feels important enough to justify the tokens.

References

[1] Thariq Shihipar (Claude Code), The Unreasonable Effectiveness of HTML: https://x.com/trq212/status/2052809885763747935 — The case for HTML over markdown as the default agent output format, with a catalog of use cases.

[2] DeepSeek-AI, DeepSeek-OCR: Contexts Optical Compression. arXiv: https://arxiv.org/abs/2510.18234 — GitHub: https://github.com/deepseek-ai/DeepSeek-OCR — The mechanism behind the screenshot trick: visual tokens encode page-structured content at a fraction of the text-token cost.