惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
D
Docker
博客园 - 聂微东
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
博客园 - 叶小钗
李成银的技术随笔
Hugging Face - Blog
Hugging Face - Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
大猫的无限游戏
大猫的无限游戏
Jina AI
Jina AI
罗磊的独立博客
小众软件
小众软件
月光博客
月光博客
量子位
雷峰网
雷峰网
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - Franky
The Cloudflare Blog
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog RSS Feed
Last Week in AI
Last Week in AI
J
Java Code Geeks
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
宝玉的分享
宝玉的分享
H
Help Net Security
腾讯CDC
T
ThreatConnect
Cyberwarzone
Cyberwarzone
S
Securelist
A
Arctic Wolf
B
Blog
有赞技术团队
有赞技术团队
Y
Y Combinator Blog
Stack Overflow Blog
Stack Overflow Blog
A
About on SuperTechFans
F
Fox-IT International blog
P
Proofpoint News Feed
The Register - Security
The Register - Security
G
GRAHAM CLULEY
C
CXSECURITY Database RSS Feed - CXSecurity.com
阮一峰的网络日志
阮一峰的网络日志
P
Privacy & Cybersecurity Law Blog
美团技术团队
博客园 - 司徒正美
Apple Machine Learning Research
Apple Machine Learning Research
Security Latest
Security Latest
F
Full Disclosure
Recent Commits to openclaw:main
Recent Commits to openclaw:main
L
Lohrmann on Cybersecurity

DEV Community

HDD Eksternal Tiba-Tiba Tidak Bisa Diakses di Windows? Ini Tiga Lapis Fix-nya DSA Application in Real Life: How Git Diff Works: LCS Intuition, Myers Algorithm, and Real Code Changes I built an AI faceless video generator in 2 months — here's the stack Diffusion Language Models: How NVIDIA Nemotron-Labs Diffusion Shatters the Autoregressive Speed Ceiling From the Renaissance to the Quantum Dawn: AI, Computation, and the Next Paradigm Shift How I Built a Review Site with 800+ Articles Using AI I Built a Smart Kitchen AI with Gemma 4 That Turns Fridge Photos Into Recipes Why your vulnerability dashboard is lying to you (and how to fix it) From Abandoned Prototype to Smart AI System: Reviving Trafiq AI with GitHub Copilot Why Country/State/City Pickers Are Weirdly Hard Node.js 22 LTS — EOL Date, Support Timeline, and What Comes Next The 7-Layer Memory Architecture Behind Modern AI Agents I Imagined Hermes Agent Running an Entire Smart City — And It Changed How I See AI One backend, four products: why we bet on platform-per-brand AI's tech debt is invisible — even to AI. I solved it at the architecture layer. Why ROAS 300% Can Still Mean Losses — Gross Margin in 5 Ecommerce Verticals You Don’t Need to Try Every AI Tool to Keep Up NovelPilot: A Novel Writing Agent Powered by Gemma 4 BoxAgnts is an Out-Of-The-Box Secure AI Agent ToolBox in a WASM SandBox Gemma 4 deep dive: why a 1.5 GB model scores 37.5% on competition mathematics, how the MoE routing actually works, and which model fits your hardware. Full breakdown inside. BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090 Google Just Declared the Chat-Log Interface Dead. Here's What Neural Expressive Actually Signals for Developers. ARCHITECTURE SPECIFICATION & FORMAL SYSTEM REPORT: k501-AIONARC Notes from a Hammock What's Google Antigravity 2.0 ? Here's What the Agent Harness Actually Changes for Developers. Building an E2EE Chat App in Flask - Part 3: Keeping File Uploads Safe Google's Gemini Spark. Here's What It Actually Does for Developers. Microsoft Just Shipped MCP Governance for .NET. Here's What It Actually Enforces. How I Built a Pakistan Internet Speed Test Platform at 16 How to Build a Supervisor Agent Architecture Without Frameworks I Built My Own Corner of the Internet — Here's What It Looks Like How does VuReact compile Vue 3's defineExpose() to React? Neo-VECTR's Rift Ascent Idempotency Keys: The API Safety Net You Probably Aren't Using Building E-Commerce Sites for Niche Products: Technical Lessons from Specialty Outdoor Retailers Audit Logs: The Silent Guardian of Every Serious System Open-source SDS tooling for Japanese MHLW compliance: the gap nobody filled BetAGracevI I Built a Post-Quantum Cryptographic Identity SDK for AI Agents — Here's Why It Needs to Exist Running Claude Code across multiple repos without losing context There Are Cameras in Every Room of My House. I Put Them There. Why your AI agent loops forever (and how to break the cycle) How does VuReact compile Vue 3's defineSlots() to React? Building a Privacy-First Resume Editor with Typst WASM and React One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd From Pixels to Prescriptions: Building an Autonomous Healthcare Booking Agent with LangGraph MonoGame - A Game Engine for Those Who Love Reinventing the Wheel # Day 24: In Solana, Everything is an Account Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests RP2040 Wristwatch Tells Time With a Vintage VU Meter Needle observations about models / 2026, may From Video Transcripts to Source-Grounded AI Notes: A Practical Look at Notesnip AI Agent Dev Environment Guide — Real Experience from an AI Living Inside a Server How I Run 7 AI Models 24/7: Multi-Agent Architecture in Practice What exactly changes with the Claude Max plan? I Revived a Broken MLOps Platform — Now It's Self-Service, Policy-Guarded, and Operationally Credible OpenAI's $2M-tokens-for-equity YC deal, decoded Why DMX Infrastructure is Still Stuck in the 90s Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm Open Source Project (No.73): Sub2API - All-in-One Claude/OpenAI/Gemini Subscription-to-API Relay I Made the Wrong Bet on Event Streaming in Our Treasure Hunt Engine #ai #productivity #chatgpt #python Symbolic Constant Conundrum From Manual RAG to Real Retrieval — Embedding-Based RAG with NVIDIA NIM Building an outbound-only WebSocket bridge for local AI agents Our System's Sins in Ghana: Why We Had to Rethink Digital Product Sales Execution Governance, AI Drift, and the Security Paradox of Runtime Enforcement Differential Pair Impedance: Why USB and HDMI Routing Is a Geometry Problem Small AI database questions can become big scans Claude Code 2.1 Agent View & /goal: Autonomous Dev Guide 2026 Your AI database agent should not see every column Rust's Low-Latency Conquest: Why We Ditched C++ for a Treasure Hunt Engine Floating-point will quietly corrupt your emissions math, and 0.1 + 0.2 already warned you Autonomous Agents: what breaks first (and why that's the real product) [2026-05-23] Agent payments are the new cloud bill footgun ORA-00069 오류 원인과 해결 방법 완벽 가이드 How I Built a Local, Multimodal Gemma 4 Visual Regression & Patch Agent: Closed-Loop Validation, Canvas Pixel Diffing, and Reproducible Benchmarks Pressure-testing Ota on Supabase: from setup prose to executable repo readiness VPC CNI en EKS: cómo dejar de pagar nodos que no usás The Future of Text Analysis: Introducing TechnoHelps Semantic Engine I built a Chrome Extension that saves product images + context directly to Google Drive & Sheets 95+ browser-based dev tools that never touch a server Running Qwen 2.5 Coder 14B Locally in Cursor with Ollama From a 10,000-line OpenSearch export script to a log analysis tool Ghost Bugs Cost $40K: A Neural Debugging Postmortem SECPAC: A Lightweight CLI Tool to Password-Protect Your Environment Variables 🚀 PasteCheck v1.7 + v1.8 — Hints that tell you what to fix, and a nudge panel that tells you where to start 8 Real Ways Developers Make Money in 2026 (Ranked by Effort) I built a free AI-powered Git CLI that writes your commit messages for you sds-converter: Converting Safety Data Sheets to MHLW Standard JSON with Rust and LLMs OpenLiDARViewer: A Browser-Based LiDAR and Point-Cloud Viewer Local-First Browser Tools: What You Should Not Upload Online Why most freelancers undercharge (and the maths behind fixing it) We built a mahjong dangerous-tile predictor calibrated on 4.97M real hands Building a Chord Progression Generator in the Browser — Music Theory in JS, Sound via Web Audio API tutorial #10: 148 Opens, 0 Replies — How My Forge Cold Email v1 Completely Failed 9 in 10 Docker Compose files skip the basic security flags How to Forward Android SMS to Telegram Automatically I built the first security scanner for MCP servers — here's what I found
llm-nano-vm v0.8.0 — deterministic FSM runtime for LLM pipelines, now with output validation and per-step timeouts
Alex Delov · 2026-05-23 · via DEV Community

Alex Delov


I've been building a deterministic FSM execution kernel for LLM workflows. v0.8.0 just shipped to PyPI. Here's what it is, what's new, and where it's going.


What it is

Most LLM frameworks treat the model as the orchestrator. nano-vm flips that: the runtime is the orchestrator, the model is just one step in a deterministic graph.

δ(S, E) → S'

Current state + validated event = next state. The model cannot skip steps, reorder them, or escape guardrails. The FSM is the source of truth.

Four step types: llm, tool, condition, parallel. Programs are plain Python dicts. No DSL parser, no heavy framework magic, and zero dependency overhead.

program = Program.from_dict({
    "name": "customer_refund",
    "steps": [
        {
            "id": "analyze",
            "type": "llm",
            "prompt": "Valid refund? Reply 'yes' or 'no'.\nRequest: $user_input",
            "output_key": "decision",
            "allowed_outputs": ["yes", "no"],   # ← v0.8.0
        },
        {
            "id": "guardrail",
            "type": "condition",
            "condition": "'yes' in '$decision'",
            "then": "process_refund",
            "otherwise": "reject",
        },
        {"id": "process_refund", "type": "tool", "tool": "issue_refund",   "is_terminal": True},
        {"id": "reject",         "type": "tool", "tool": "send_rejection", "is_terminal": True},
    ],
})

Enter fullscreen mode Exit fullscreen mode

The guardrail step cannot be bypassed regardless of what the model returns.

What's new in v0.8.0

allowed_outputs — LLM enum guard

Validates the model's raw output against an explicit list before the value touches anything downstream.

{
    "id": "classify",
    "type": "llm",
    "prompt": "Classify. Reply ONLY with: refund / query / other",
    "allowed_outputs": ["refund", "query", "other"],
    "on_error": "skip",   #  falls back to "refund" (first element) on mismatch
}

Enter fullscreen mode Exit fullscreen mode

Three policies on mismatch: fail (default, trace → FAILED), skip (substitute allowed_outputs), retry (retry up to max_retries, then FAILED).

timeout_seconds + on_timeout — per-step LLM timeout

Prevents a hung API call from stalling the entire FSM.

{
    "id": "analyze",
    "type": "llm",
    "timeout_seconds": 5.0,
    "on_timeout": "fallback",   #  falls back to allowed_outputs[0] or ''
}

Enter fullscreen mode Exit fullscreen mode

Two policies: fail (default) and fallback. Both features are independent and composable — you can use either or both on any llm step.

What it can do right now

  • Suspend / resume. Return "PENDING" from any tool → FSM → SUSPENDED, cursor persisted. Resume from any external event (webhook, approval, settlement). RUNNING → SUSPENDED → RUNNING → SUCCESS
  • Condition branching with ASTEngine. eval() is gone. Conditions are parsed into a validated JSON AST and evaluated by a sandboxed interpreter. No Python builtins accessible. Method calls (.lower() etc.) raise ASTEvalError at parse time, not silently return False.
  • GDPR tombstoning. Sensitive values stored as CapabilityRef tokens (vault://secret/). On erasure event: ref tombstoned, all projections return [REDACTED_TOMBSTONE], hash chain stays valid.
  • GovernanceEnvelope. Every successful step produces an immutable, append-only audit record: execution_id, step_id, policy_hash, canonical_snapshot_hash, sanitized payload.
  • MCP gateway (nano-vm-mcp). Exposes run_program, get_trace, list_programs etc. over stdio or SSE transport with bearer auth and SQLite WAL persistence. Works with Claude Desktop and any MCP client.
  • Budget guardrails. max_steps, max_tokens, max_stalled_steps — FSM halts with BUDGET_EXCEEDED or STALLED before the next step, not after.

Benchmark — v0.8.0 (WSL2 · Python 3.12 · MockAdapter · 3×5×10k)
10/10 PASS · 1,096,500 ops · 0 violations
ScenarioMean TPSp95
Refund pipeline
2,200/s
123 ms
Double-execution guard
2,800/s
69 ms
Budget enforcement
2,400/s
97 ms
Parallel throughput
1,000/s
196 ms
MCP store round-trip
11,000/s
0.13 ms
GovernanceEnvelope
2,100/s
108 ms
Crash consistency
11/s
115 ms
Replay equivalence
1,300/s
164 ms
Adversarial retries
2,600/s
87 ms
Long-horizon (1k steps)
95/s
11,887 ms

BM-INT-07 (Crash consistency): crash_rate=100% hash_match=100% — replay after simulated crash produces identical trace hash every time.

BM-INT-10 (Memory footprint): peak RSS 76.5 MB, alloc 3.62 MB for 1,000-step programs — no memory leaks detected.

Validated on real payment APIs

  • Two PoCs, both 9/9 tests passing with mock adapters:
  • MoMo Payment API v4 — 3-way condition branch, HMAC-SHA256 IPN verification, polling loop with retry, next_step/is_terminal DSL.
  • Stripe Payment API v1 — 3DS flow (REQUIRES_ACTION sentinel), refund pipeline with LLM classifier, webhook verification. Found and fixed two bugs in the process: "PENDING" sentinel collision (Stripe was returning it as a domain status, triggering FSM suspend), and silent ASTEvalError for .lower() in condition expressions.

What's coming next
Phase 0 (Immediate): ProgramValidator — static analysis at Program build time. Catches missing then/otherwise/next_step targets, unreachable steps, and cycle detection. Currently these fail at runtime; when dealing with LLM-generated workflows, static analysis is a must.

Phase 1 (Gateway Correctness): StateContext persistence between MCP calls in SQLite WAL. Right now, if the gateway process restarts after /create but before polling completes, you get a new requestId — which is a real financial duplicate risk. Closing this with an execution_contexts table + upsert on every step. Up next: TRACE projection to SQLite, GovernedToolExecutor (policy-level tool capability enforcement), idempotency_store, and native vm.step() MCP wiring.

Phase 2 (Dev Agent): nano-vm-dev-agent — the FSM runtime managing its own development stack (read_repo_files → generate_patch(llm) → run_mypy → run_pytest → write_repo_files). DA-1 milestone is done (12/12 tests). DA-2 will be the first live run against a real sprint task (StateContext persistence). Still working on search_code and reproduce_bug tool-functions before launching live.

Phase 3 (Observability): OpenTelemetry span per FSM step + incremental counters in Trace (llm_calls, tool_calls, retries_total).

Install
pip install llm-nano-vm==0.8.0

pip install llm-nano-vm[litellm]==0.8.0 # LiteLLM provider support

pip install nano-vm-mcp # MCP gateway

LLMs are completely optional. The runtime works perfectly fine as a pure, lightweight deterministic workflow engine.

Questions / feedback welcome!