惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Microsoft Azure Blog
Microsoft Azure Blog
有赞技术团队
有赞技术团队
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
F
Fox-IT International blog
Recorded Future
Recorded Future
T
ThreatConnect
T
The Exploit Database - CXSecurity.com
SecWiki News
SecWiki News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
人人都是产品经理
人人都是产品经理
T
Tenable Blog
L
LINUX DO - 最新话题
博客园_首页
Hugging Face - Blog
Hugging Face - Blog
罗磊的独立博客
博客园 - 司徒正美
The Hacker News
The Hacker News
博客园 - 聂微东
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Scott Helme
Scott Helme
博客园 - 【当耐特】
O
OpenAI News
Schneier on Security
Schneier on Security
Latest news
Latest news
S
Security @ Cisco Blogs
S
Secure Thoughts
F
Full Disclosure
L
Lohrmann on Cybersecurity
S
SegmentFault 最新的问题
T
Tor Project blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
量子位
小众软件
小众软件
T
Threat Research - Cisco Blogs
Simon Willison's Weblog
Simon Willison's Weblog
IT之家
IT之家
大猫的无限游戏
大猫的无限游戏
N
News and Events Feed by Topic
E
Exploit-DB.com RSS Feed
J
Java Code Geeks
Last Week in AI
Last Week in AI
酷 壳 – CoolShell
酷 壳 – CoolShell
Application and Cybersecurity Blog
Application and Cybersecurity Blog
S
Schneier on Security
Cisco Talos Blog
Cisco Talos Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Proofpoint News Feed
Recent Commits to openclaw:main
Recent Commits to openclaw:main
雷峰网
雷峰网

DEV Community

Rugby Fundamentals as Software Concepts - Mapping the Pitch to your Code Base Why Zed Is Replacing VS Code in My AI-Augmented Workflow Build a scroll-driven WebGL hero in 30 lines Karpathy's LLM Wiki? No Code with Claude or Github Copilot! Why Platform Governance and Transparency Matter for Developers and Freelancers I built a Flutter CLI that generates Clean Architecture in seconds Using an LLM to automate a task that used to take hours by hand CyberArena – Interactive Cyber Security Simulation & Threat Analysis Platform Tile Extractor Mathematical Functions in CSS: clamp, min, max and How They Simplify Responsiveness Polyglot Persistence in Microservices: Let the Domain Choose the Database 190 Countries, Zero API Calls: Shipping Static Data in a Chrome Extension Your AI Writes Code Fast. Here’s How to Check It Before Shipping qwen2.5-coder is too slow for Claude Code on a Mac. Here's the fix. Building Automated Text-to-Video Pipelines with AI Can Gemini Become an Offline AI Tutor? Lessons from Building Educational AI OPRIX : From a simple messaging web app to a well structured and enhanced UI messaging web app Why React + TypeScript Nullability Slowly Becomes Exhausting Why AI Agents Need a Project Layer - Part 1 Stop Hand-Editing MCP Configs: A Zero-Dependency Go CLI What I Learned Working With Microsoft, SQUAD(GTCO), and Different Tech Communities 🧠 Hermes Agent Assistant — A Modular AI Agent System with Planner, Executor & Memory Spring Boot Auto-Configuration Source Code: Nail This Interview Question The Ultimate Guide to Free AI API Keys: 6 Platforms You Need to Know Why 91% of AI Agents Fail in Production (And What the 9% Do Differently) TryHackMe | Battery | WALKTHROUGH Stop Guessing Your Regex — Test It Live in the Browser I Built FreelancEye, an Open-Source Mobile PWA for Finding Clients Beyond the Hype: My Production Playbook for Docker Swarm Top AI App Builder Platforms with Integrated Backend, Hosting & Database ECS vs EKS in 2026: An Honest Comparison from Someone Who Has Run Both in Production Hardening Your Node.js App Against Supply Chain & Remote Code Execution Attacks linux commands A Practical GEO Case: How an AI System Started Recommending Our Blog Your AI Agent Works 24/7 and Earns $0. I Built the Fix. Your AI Trading Agent Will Lose All Your Money — Here's How To Stop It Google I/O 2026: What Happens When Everything Connects? Why AI writes software but doesn’t build a good product Beyond the Hype: How Google I/O 2026 Secretly Democratized Production-Ready AI Agents with Managed Sandboxes. The Killer Assumption Test: How to Spot Doomed Product Decisions Before You Ship Stop Describing Your Bugs — Just Screenshot Them # I Built an AI Website Builder and Here's What Actually Happened Cooking an AI Campaign in 5 Minutes with Google Cloud AI APIs Your PM Retrospectives Are Lying to You How I Built a Free, Self-Hosted Pipeline That Auto-Generates Faceless YouTube Shorts TypeScript 54 to 58: The Features That Actually Matter in 2026 How to Tailor Your CV to Any Job Posting in 2026 The 7-day SaaS MVP loop: ship fast, then validate with people who actually show up 95. Fine-Tuning LLMs: Make a General Model Do Your Specific Job What Is a Frontend Developer Roadmap and Why You Need One Google shipped three Gemini "Flash" models. Picking the wrong one could 6 your AI bill Building an MCP server so Claude can query my SaaS analytics directly Google I/O 2026 and the Rise of the AI Ecosystem Your Docker Builds Are Slow Because You're Doing It Wrong (And I Built a Tool to Prove It) How do you verify GitHub contributions without trusting self-reported skills? CV vs Resume: What's the Difference and Which Do You Need? student Devs: Build AI Agents & Compete for $55K in Prizes 🚀 How to Write a Cover Letter That Actually Gets You Interviews Battle-Tested: What Getting Hacked Taught Me About Web & Cyber Security Unda folders za kuandika code >> mkdir src >> cd src >> mkdir controllers database routes services utils >> cd .. Directory: C:\Users\mwaki\microfinance-system Mode LastWriteTime Length Name Code Coverage .NET AI slop debt" is technical debt on fast forward. Nobody's ready. Multi-Head Latent Attention (MLA) Memoria - A Local AI Reading Companion Powered by Gemma 4 Stop Trusting Your Accuracy Score: A Practical Guide to Evaluating Logistic Regression Models Serious Question: Is the Developer Job Actually in Risk Due to AI? published: true tags: #discuss #career #ai #help rav2d: We ported an AV2 video decoder from C to Rust — here's why Your New Domain's First Week of GA4 Is a Lie: 4 Days of Raw Data from a Launch Gemma Guide - Real-Time Spatial Awareness for Blind Users From YAML to AI Agents: Building Smarter DevOps Pipelines with MCP A Field Guide to Human–AI Relations (For the Newly Bewildered Mortal) The AI Agent That Learns While It Works — A Complete Guide to Hermes Agent Inviting collaborators to work on ArchScope ArchScope is an interactive web-based tool that lets you design, visualize, and test system architectures with real-time performance simulations. Github - ArchScope is an interactive web-based tool that lets you Gemma 4: Google's Open-Weight AI Is a Game Changer for Developers Confessions of a Git Beginner: Why the Terminal Stopped Scaring Me Docker 容器化实战:从零到生产部署 🚀 I Built a Full Stack Miro Clone with Real-Time Collaboration using Next.js Building an African Economic Data Pipeline with Python, DuckDB & World Bank API llms.txt vs robots.txt vs ai.txt: The Developer's Cheat Sheet Intigriti Challenge 0526 Writeup Business Logic Flaws: How Attackers Skip Steps in Your App to Get What They Should Never Have Why Vibe Coders Need Boilerplates to Save Time, Tokens, and Build More Secure SaaS Projects Idle Cloud Cost Is the New Egress Cost Quark's Outlines: Python Traceback Objects Ghost in the Stack (Part 1): Why uninitialized variables remember old data Building a High-Performance Local Chess Assistant Extension with WebAssembly Stockfish and Manifest V3 Breaking the Trade-off Between Self-Custody and Intelligent Automation on the Stellar Network I Open-Sourced a Practical Fullstack Interview Preparation Repository (React + Node + System Design) 🚀 How I Started Coding as a Student (Beginner-Friendly Guide) WordPress vs. Ghost: Why Automated Bot Attacks Are Making us think much I tested 4 AI agent-governance tools against an open spec - here's the matrix zkML Inference Proof: What the Receipt Proves, and What the Model Still Does Not I Scored 1000/1000 on AWS Certified AI Practitioner (AIF-C01) Here's Every Resource I Used Go - Struct and Interface Handling JSON Requests in Go Storing Kamal secrets in AWS Secrets Manager and deploying to a cheap Hetzner VPS How I Caught and Fixed an N+1 Query in My Django REST API I got tired of paying $10/month to remove image backgrounds – so I built it for free How to Start Coding as a Student: A Complete Beginner’s Guide 🚀 Storing Kamal secrets in AWS Secrets Manager and deploying to a cheap Hetzner VPS
I Let Claude Code Run Unsupervised for 24 Hours. Here's What Happened.
v. Splicer · 2026-05-23 · via DEV Community

The experiment was simple: point Claude Code at a real project, give it a task list, stand back for 24 hours, and document what actually came back. No hand-holding. No mid-session prompts. Just a system prompt, a tool set, and an instruction file that spelled out what I wanted done by morning.

What came back was not what I expected. Some of it was better. Some of it was a mess that required a full afternoon to untangle. All of it was useful data for anyone trying to build persistent, autonomous agent workflows.


The Setup

The project was a Python-based recon automation tool I had been working on incrementally. Backend was functional but the codebase was disorganized, the output formatting was inconsistent, and I had a backlog of about 15 documented issues ranging from minor refactors to one genuinely unpleasant bug in the rate-limiting logic.

Claude Code ran inside a tmux session on a headless Ubuntu VPS. The CLAUDE.md file at the project root defined the task priority order, which directories were off-limits, what the output format for completed tasks should look like, and a hard rule: if it encountered something that required a decision with more than two plausible outcomes, it should stop and write a BLOCKED.md file describing the ambiguity rather than picking arbitrarily.

Tool permissions were scoped intentionally. File read/write for the project directory. Bash execution limited to the virtual environment. No network access beyond localhost. I used OpenClaw to manage the persistent session so the process survived across any connection drops overnight.

The model was claude-sonnet-4-5. Max tokens per call set to 8192. Task file had 15 items.


What It Completed

By hour 6, it had closed 9 of the 15 tasks. The refactors were clean. Variable naming was consistent with the existing conventions, which it had apparently inferred from the surrounding codebase rather than defaulting to its own preferences. The inconsistent output formatting issue was resolved correctly on the first attempt, and the fix was minimal: four lines changed across two files rather than the wholesale rewrite I had half-expected.

The rate-limiting bug was the interesting one. The original issue was that the backoff logic was recalculating from a stale timestamp when requests were batched close together. Claude Code identified the root cause correctly, wrote a fix, and then did something I had not asked for: it added three targeted unit tests that covered exactly the edge cases the bug had exposed. The tests passed. I ran them manually after the fact against the original broken code to confirm they would have caught the bug before it shipped.

That is the best-case outcome in this kind of workflow. Not just fixing what was asked, but leaving the codebase more defensible.


Where It Got Stuck

Three tasks produced BLOCKED.md entries. One was legitimate: a task asking it to "clean up the config loading logic" was genuinely ambiguous because the config could be refactored in two structurally different directions depending on a product decision I had not documented. The block note was accurate and well-described. That one I appreciated.

The second block was less impressive. The task involved updating a requirements.txt dependency to the latest compatible version. Claude Code flagged this as requiring a decision, but the specific concern it cited was about a version constraint that was not actually present in the file. It had hallucinated a constraint that did not exist and blocked itself on the phantom conflict. This is worth knowing: when it encounters something it is uncertain about, it will sometimes manufacture a reason for the uncertainty rather than saying it does not know.

The third block was a task I had phrased badly. That one is on me.


The Three Tasks It Got Wrong

Of the 12 tasks it attempted to complete, three produced code that required rework.

Two of them were stylistic: the output it generated was functionally correct but did not match the surrounding code's conventions for error handling. Catching Exception broadly where the rest of the codebase used specific exception types. Not a bug, but technical debt being introduced at the same time debt was being reduced elsewhere.

The third was more significant. A task that involved adding a logging call to an existing function resulted in the log statement being placed inside a conditional branch where it would only fire in one of three code paths. The log was there, but it was not where it needed to be to be useful. The test suite did not catch it because the tests covered the happy path. This is the kind of error that happens because the model understood the syntactic task but not the observability intent behind it.

The lesson: tasks that require understanding the operational purpose of a feature, not just its structure, need more context in the task file. "Add logging" is not a task. "Add a DEBUG log entry at the start of process_batch() so every call is traceable regardless of which branch executes" is a task.


The Drift Problem

By hour 18, something subtle had happened. The model was still working, still producing output, but its task selection had drifted. Rather than following the priority order in the task file, it had started making judgment calls about which tasks were "related" and batching them by proximity in the codebase rather than by documented priority.

This is not malicious or chaotic. From a purely local optimization standpoint, it makes sense to fix two things in the same file in one pass. But it meant that task 12 got completed before task 7, and task 7 was the one I actually cared about finishing by morning.

Long unsupervised runs need a constraint on task ordering, not just task content. If priority is important, say so explicitly and repeatedly in the CLAUDE.md. "Complete tasks in numbered order. Do not batch by proximity. Do not skip ahead." Vague priority signals do not survive a 24-hour context.


What the Log Told Me

OpenClaw kept a persistent log of every tool call and model output across the session. Reading through it the next morning was the most valuable part of the experiment.

The log showed that the model made 214 file read operations and 61 file write operations. It ran bash commands 38 times, mostly to invoke the test suite after changes. Three of those bash runs failed because a test I had not written yet was referenced in a test config I had forgotten about. Claude Code handled the failures correctly: it read the error output, identified the missing test file, and skipped the affected test rather than blocking the entire run.

The log also showed something about pacing. The first 8 hours had dense activity. Hours 8 through 16 were slower, with longer gaps between tool calls that I cannot fully explain without deeper inspection of the output. By hour 20 it had picked back up. Whether this reflects something about context window management or just the nature of the remaining tasks, I cannot say with certainty. It is worth monitoring in future runs.


What This Workflow Is Actually Good For

Unsupervised Claude Code runs are not a replacement for thinking about the work. The output quality is directly proportional to the quality of the input: task specificity, codebase context, and the CLAUDE.md constraints all matter more than most people expect going in.

Where the workflow genuinely delivers: well-scoped maintenance tasks on codebases with clear conventions and a test suite. Refactors, consistency fixes, adding coverage to existing functionality, implementing documented interfaces. Tasks where "correct" has a verifiable definition.

Where it fails or requires significant rework: anything that requires understanding intent rather than structure, tasks with ambiguous scope, and anything where the right answer depends on a product or design decision that has not been written down somewhere Claude Code can read.

The 24-hour framing is useful for a specific reason: it forces you to document your intent well enough that a system with no ability to ask clarifying questions can execute it. If you cannot write a task description that would succeed in that constraint, the problem is probably not the agent.


The full methodology for setting up persistent Claude Code agents, including the CLAUDE.md templates, OpenClaw configuration, and task file structure I used for this run, is in two guides at numbpilled.gumroad.com.

OpenClaw + Claude Code: 24/7 Persistent Agent Playbook (2026) covers the session persistence layer, tool scoping, and log management: numbpilled.gumroad.com/l/openclaw-claude-code

Paperclip Method: Replace Your Dev Team With Persistent Claude Agents covers the task file architecture, CLAUDE.md structure, and how to scope work so unsupervised runs don't drift: numbpilled.gumroad.com/l/paperclip-claude-method

Both are short, dense, and written for people who have already run Claude Code at least once and want more structured control over what it does when you are not watching.