惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
C
Comments on: Blog
WordPress大学
WordPress大学
S
SegmentFault 最新的问题
阮一峰的网络日志
阮一峰的网络日志
Martin Fowler
Martin Fowler
A
About on SuperTechFans
H
Help Net Security
美团技术团队
I
InfoQ
Engineering at Meta
Engineering at Meta
Stack Overflow Blog
Stack Overflow Blog
罗磊的独立博客
I
Intezer
Microsoft Azure Blog
Microsoft Azure Blog
T
ThreatConnect
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
V
Vulnerabilities – Threatpost
A
Arctic Wolf
Spread Privacy
Spread Privacy
Know Your Adversary
Know Your Adversary
C
CERT Recently Published Vulnerability Notes
P
Privacy & Cybersecurity Law Blog
T
Tenable Blog
爱范儿
爱范儿
F
Full Disclosure
L
Lohrmann on Cybersecurity
小众软件
小众软件
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
GbyAI
GbyAI
P
Privacy International News Feed
T
True Tiger Recordings
O
OpenAI News
MyScale Blog
MyScale Blog
V
V2EX
酷 壳 – CoolShell
酷 壳 – CoolShell
Simon Willison's Weblog
Simon Willison's Weblog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
博客园 - 叶小钗
Y
Y Combinator Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
雷峰网
雷峰网
N
News | PayPal Newsroom
T
Tailwind CSS Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
腾讯CDC
Google Online Security Blog
Google Online Security Blog

DEV Community

An Official Claude SDK for .NET? Yes, Really. FlashAlpha vs Quant Data: What an AI Agent Can Actually Reason Over We Ran 3,000+ AI Prompts to Test GEO for B2B SaaS. Here's What We Found. Testing Turbo Frames in Rails Without a Browser Best Frontend Coding Agent: AI Workflow Shortlist I Didn't Know What a Webhook Was. Then One Broke My Agent. How /letsgo Works: A Master Agent Orchestrator for Claude Code SiteRows example #1: AllReduce Stalls Are Network Stalls. Most Tools See Neither. I Built FreeDevUtils — 60+ Free In-Browser Developer Tools using github copilot an google gemini pro for developer community Most programmers are miserable and we pretend that's normal Serverless Research Paper Intelligence: Docling, Lambda Containers, and Amazon Bedrock 🇺🇸 Rails Realtime ERD: Visualize Your Rails Schema in Real Time RAG for Codebases Is Harder Than It Looks When Cucumber Grows Too Big: Pain Points, Lessons Learned, and Alternatives Pay for Any API from Inside Claude with Base MCP + APIbase I Set Up CI/CD for My React App in 5 Minutes — Here's the Exact YML Config GCSI 2026: AI Readiness in a City Built in Layers 🇧🇷 Rails Realtime ERD: visualize seu schema Rails em tempo real Rails Realtime ERD: visualize seu schema Rails em tempo real The Moment the JSON Config Parser Became the Enemy n8n vs Zapier — Which Is Right for Production Workflows? AI Security Tools Are Drowning Open Source Maintainers — curl Is the Canary I was wondering whether we can write both the Deployment and Service manifest in the same file? but your explaination made it clearer GitHub Copilot Has a New App. Here's What Changed for My Daily Workflow. 5 gotchas I hit moving LLM logs from Postgres to ClickHouse AWS Database Savings Plans: What DB Teams Need to Know Self-Expiring Report-Only CI Gates: From Advisory to Enforced Cadence v8.4: a multi-model coding harness where Claude writes, Codex reviews, and Bugbot triages What happens when an AI agent commits to your repo How I Run Two Claude Accounts as One How to Pass the Google Play 12-Tester Rule Without Losing Your Sanity The Degradation Ladder: How Systems Fail Before They Fail Deploy Ping Identity Products on Kubernetes with a Single Operator Flutter Deep Linking: Complete Guide for Android App Links & iOS Universal Links I Read Anthropic's 2026 Agentic Coding Trends Report. Here's What It Actually Means for Engineering Teams. Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: The Standby Cluster Method Less Than a Penny Per Document How to Build Your First REST API in Node.js ? MCP Isn't a Model Feature. It's a Power Outlet for Your Tools. Testing JavaScript: A Practical Guide to TDD with Jest (2026) When Your Search Tree Becomes the Bottleneck in a Distributed Game Server GitHub Code Coverage in Pull Requests: What Developers Should Set Up Now Vibe Coding vs. Real Coding: Why Both Are Wrong (and Right) Why I’m Building a Privacy-First SOW Analyzer to Kill Scope Creep (Launching Next Month) FHIR in Indian Healthcare IT: What Every Developer Building HMIS Software Needs to Know Data Normalization Across Dublin Rental Portals: How to Make Listings Comparable Building a Rental Aggregator When Daft.ie Already Exists Finishing Hakozuna HZ5: From Experimental Allocator to DOI-Archived Artifact Building search features for users in different timezones. The remote renter problem. State management for real-world workflows: tracking apartment viewings and applications How I built automated reminders into a Slack approval tool with zero coding experience Identity Verification Just Became Infrastructure — And Your Evidence Better Survive It The Production Deployment Checklist Senior Devs Never Skip (2026) Stop relying on Cursor AI. You are destroying your engineering brain Building an Automated Invoice Processing Pipeline with Node.js Built and launched WebDoctor AI 🌐🧠 AI Citation Registry: Decentralized Coordination in Government AI Attribution How to Fix CSV Encoding Issues (UTF-8, Windows-1252, and More) Building the private markets data infra for AI agents Why Your Resume Keeps Getting Rejected by ATS Systems (Even When You’re Qualified) Building an Offline-First Architecture for 40,000+ Concurrent RFID Scans I Built a Tiny Chrome Extension to Save My Mouse Wheel (Auto Scroll) # I Got Burned by Socket Chaos. Here's How I Finally Built Real-Time Calls That Actually Work. How to Cut Your CSS File Size by 40% Without Losing Any Styles Building a Zero-Friction Browser Screen Recorder (Just Press Alt + R) AI Wrappers Are Dying: Why Most AI Products Fail The Operators Regret: How We Blew Up the Event Bus at 3 AM 'Verified' mudou de significado: o que agentic engineering exige de times de desenvolvimento A Flask Vulnerability Walkthrough How DeepMind AlphaProof Nexus Cracks 56-Year-Old Math: Agentic LLM Loops and Lean Formal Verification Why your AI shouldn't decide alone: the 3-options pattern Pourquoi votre IA ne devrait pas trancher seule un audit ou une permission One year of self-hosted n8n on a $6 Hetzner VPS Adding comments to a static Astro blog with Netlify Forms I Built 30+ Free Online Tools With Zero Signup, Zero Tracking, and Instant Access We just launched on the Shopify App Store - here's the architecture behind what we built How to Delete a Cloudflare Access Application (Without Guesswork) Why Backend Secrets Leak More Often Than Developers Think: A Deep Dive into Runtime Security with XyPriss I built an MCP server for DNS + email security — 37 tools for Claude Code, Cursor, Windsurf CI/CD avec GitHub Actions I Used Amazon Bedrock as My AI Coding Partner for a Day Here's What Happened From Vibe Coding to Verified Engineering Building a ESP32-CAM Helmet Detection System Using and CircuitDigest Cloud Vitalii Kiro: The Drone War Is Over. The War of Algorithms Begins App Development Costs in India (2026): A No-Fluff Technical Breakdown How to Automate File Renaming with AI and OCR Why green CI doesn't mean your system works Capacity Governance in Microsoft Fabric: The Layer Most Teams Forget AI Observability: Stop Flying Blind in Production I love MJML — I just didn't want a whole templating engine for two tiny things Are we still in the Console Era of AI? Building a Senior-Level DevOps / SRE / Infrastructure Engineer Terminal Setup (macOS) Media Queries, Transitions, Positions, and Units (rem vs em) Explained Vibe Coding Will Destroy Your Software Engineering Career Your Payment API Wasn't Built for AI Agents. Open Banking Might Be the Fix. The Amazon Interview Process in 2026: Every Round Decoded (With Copy-Paste Scripts) Why Most Social Platforms Optimize Engagement Instead of Emotional Safety How to Build Your Own AI API Gateway (70x Cheaper Than GPT-4o) OpenBrief Review: Local-First Video AI Summarizer 2026
Agents are workflows. SirenSpec is the workflow tool that admits it.
Tristan Smit · 2026-05-27 · via DEV Community

TL;DR: Most production "agents" are really just workflows with a fixed sequence of LLM calls with some branching. SirenSpec is a YAML-first SDK that treats them that way. A whole pipeline can live in one .yaml file that a teammate can read in 30 seconds, you can validate before you run it, and can test in CI without spending a cent on tokens.

Two stories from the last year and a half.

A developer's autonomous agent spent $47,000 on itself in a runaway loop before anyone caught it. A different one burned $4,200 over a single weekend...63 hours of uncapped inference while its owner was at a wedding.

Neither developer was careless. Both wrote code that looked fine. The bug wasn't in a specific line: it was that the shape of the system, what runs, in what order, and what it's allowed to do, was scattered across a state machine in three Python files. You couldn't look at it. You could only trace it after something went wrong.

That's what gets me about the way most agent frameworks are built. Runaway loops are a known failure mode. But the deeper problem is that calling something an "agent" implies intelligence and autonomy, and that framing leads you to build something opaque by default. What both of those systems actually were, underneath the branding, was a sequence of LLM calls with some branching. A workflow. And workflows should be readable.

I'm Tristan, and I built SirenSpec because most production AI workflows shouldn't need a framework at all. They need a spec.


Most "agents" are just workflows

Worth asking before we go further: is "agent" even the right word for what most of us ship?

A few people arrived at the same answer recently. Anthropic's "Building Effective Agents" tells you to start with the simplest thing that works, usually a plain pipeline, and only reach for real agentic behavior when nothing simpler will do. Temporal's team was blunter: "agents are just workflows, really." And an arXiv survey of multi-agent pain points found the top two developer frustrations were orchestration semantics and policy enforcement, exactly the things that vanish into code in most frameworks.

Strip the branding and a production "agent" is usually a sequence of LLM calls, some shared context, a few conditional branches, and rules about what each step is allowed to do. That's a workflow. And if it's a workflow, the definition should be the first thing you read, not something you reconstruct from a pile of StateGraph.add_node() calls.


What SirenSpec is

SirenSpec is a YAML-first agent orchestration SDK. You write the whole pipeline in one file: the agents (model plus system prompt), the nodes (which agent runs, and where its output goes), the edges (order, plus optional branching), and the guardrails (injection detection, PII redaction, output validation, cost caps). Run it from the CLI and you get a JSON trace of every node, token, and decision.

version: "0.1"
env_file: .env

agents:
  researcher:
    model: "openai:gpt-4o"
    system: "Summarize the following for a non-expert."
  writer:
    model: "anthropic:claude-3-5-sonnet-20241022"
    system: |
      Write a 200-word blog intro from this research:
      {{ research.output }}

nodes:
  research:
    agent: researcher
    writes: working.research
  write:
    agent: writer
    writes: output.draft

edges:
  - from: research
    to: write

guardrails:
  - injection
  - name: length
    config:
      max_chars: 1000
  - name: cost_cap
    config:
      max_usd: 0.10

Enter fullscreen mode Exit fullscreen mode

A two-agent pipeline, and also the complete answer to what it does, what it's allowed to do, and in what order. You don't need Python to read it. That matters more than it sounds, because the person who needs to read it usually isn't the person who wrote it.


Try it in 30 seconds

pip install sirenspec
sirenspec init            # scaffolds a workflow.yaml
sirenspec run workflow.yaml

Enter fullscreen mode Exit fullscreen mode

Install, scaffold, run. No project setup, no boilerplate, and sirenspec validate will catch a broken workflow before it ever calls a model.


Three things it does differently

1. The whole workflow fits on one screen

Here's the GitHub triage example from the cookbook:

version: "0.1"
env_file: .env

agents:
  classifier:
    model: "openai:gpt-4o-mini"
    system: |
      Classify this GitHub issue. Return JSON with:
      category (bug|feature|question|docs), priority (low|medium|high), needs_repro (bool).
      Issue: {{ inputs.message }}
    guardrails:
      - name: schema
        config:
          schema:
            type: object
            required: [category, priority, needs_repro]
  responder:
    model: "anthropic:claude-haiku-4-5-20251001"
    system: |
      Write a friendly triage response.
      Classification: {{ classify.output }}

nodes:
  classify:
    agent: classifier
    writes: working.classification
  respond:
    agent: responder
    writes: output.response

edges:
  - from: classify
    to: respond

guardrails:
  - injection

Enter fullscreen mode Exit fullscreen mode

The equivalent in Python code means wiring up functions, managing prompt strings separately, and threading context between calls manually. You're past 50 lines before you've written a single system prompt.

This isn't a line-count contest. It's about whether the shape of the workflow survives without a Python interpreter running in your head. Hand github-triage.yaml to your PM, your ops lead, or whoever inherits the project after you leave, and they can see what runs, in what order, and what it's not allowed to do. "Shorter code" and "a non-engineer can read it" are different claims. SirenSpec is going for the second one.

2. sirenspec validate fails before you push

Before a single API call fires:

sirenspec validate research-pipeline.yaml

Enter fullscreen mode Exit fullscreen mode

✗ Node 'analyze' references undefined agent 'analyzr' — did you mean 'analyzer'?
✗ agents.verify.system: field required
✗ InterpolationError in '{{ missing_node.output }}': node not found

Enter fullscreen mode Exit fullscreen mode

Each line is a real class of bug. A typo'd agent name gets caught at load by Pydantic instead of throwing a KeyError mid-run, which is a thing people hit in CrewAI. A node missing its system prompt surfaces here, not as a confusing provider error three steps in. And if node A's prompt references node B while B's references A, SirenSpec catches the cycle at load. LangGraph lets you build it and tells you at runtime.

validate exits 0 or 1, makes no API calls, and costs nothing to run. The bugs other frameworks find in production, yours finds in CI.

3. Guardrails ship in the box

agents:
  classifier:
    model: "openai:gpt-4o-mini"
    system: "Classify this support ticket."
    guardrails:
      - injection                    # prompt-injection detection
      - name: pii                    # redact before the model sees it
        config:
          entities: [email, phone, ssn]
      - name: length
        config:
          max_chars: 2000

Enter fullscreen mode Exit fullscreen mode

These sit on the agent, right next to the model and the prompt. Not a separate library, not middleware, not a plugin you bolt on later. Cost caps live in the same place:

guardrails:
  - name: cost_cap
    config:
      max_usd: 0.50

Enter fullscreen mode Exit fullscreen mode

That one line is the difference between the $47K story and a run that stops itself. It's optional; skip it for a low-stakes internal tool, but when you want it, it's one line, and anyone can open the file and confirm it's there. You can't say that about a setting buried in a Python state machine.


Cassettes: tests that don't call the API

sirenspec test records a real run once, then replays it. After that, CI runs against the recording: deterministic, instant, no tokens.

# Record against the live API, once
sirenspec test tests/triage_test.yaml --record --cassette cassettes/run.yaml

# Replay in CI — no live calls
sirenspec test tests/triage_test.yaml --mock --cassette cassettes/run.yaml

Enter fullscreen mode Exit fullscreen mode

The closest comparison is Pydantic AI's TestModel, but that's a mock: you assert against synthetic output. A cassette is the real model's response, run through your real pipeline. So when a model update quietly changes what you get back, it shows up as a failing test in a PR, not as a strange trace in production three weeks later.


Render: turn your YAML into a diagram

One command turns any workflow into a Mermaid flowchart:

sirenspec render workflow.yaml --target mermaid

Enter fullscreen mode Exit fullscreen mode

Here's the output for the email triage example, a workflow that fetches your latest unread Gmail, fans out to three classifiers in parallel (urgency, intent, sender reputation), then routes to whichever response agent fits:

graph TD
    fetch_email[fetch_email\npython tool]
    triage[triage\nswrm]
    urgency[urgency]
    intent[intent]
    sender[sender]
    synthesis[synthesis]
    draft_reply[draft_reply]
    forward_note[forward_note]
    archive_reason[archive_reason]

    fetch_email --> triage
    triage --> urgency
    triage --> intent
    triage --> sender
    urgency --> synthesis
    intent --> synthesis
    sender --> synthesis
    synthesis -->|reply| draft_reply
    synthesis -->|forward| forward_note
    synthesis -->|archive| archive_reason

Enter fullscreen mode Exit fullscreen mode

Paste it into any Mermaid renderer and you get a diagram of your pipeline without writing a single line of diagram code.

This matters more than it sounds, because your workflow's audience is no longer just you. Your PM wants to know what it does. Your manager wants to audit it. And increasingly, your AI coding tools need to understand it too. Mermaid is significantly more token-efficient than ASCII diagrams for LLMs, with less chance of misinterpretation. Drop a rendered diagram into your CLAUDE.md or project README and Codex, Claude Code, or whatever you're pairing with can orient itself in seconds.


What it isn't

If you're sizing this up, here's where it stops.

No dynamic loops, no autonomous tool selection, no handoffs, no memory layer. You write the graph; the graph runs. Connectors, web browsing, and richer tool integrations are on the roadmap, but they're still in planning.

SirenSpec is for the script you've already written more than once: the one that calls OpenAI, retries on a 429, checks a JSON shape, counts tokens, and hopes. That script, with a spec you can read, a validator, and tests around it.


At a glance

SirenSpec Big Agent Frameworks Raw SDK
Readable by non-engineers
Pre-run validation
Guardrails built in DIY DIY
CI tests via cassettes
Dynamic agent loops
Provider-agnostic Varies

A few questions I get

Does it support loops? Yes, via factory nodes. A factory iterates over a list and runs one agent instance per item, with configurable concurrency. The changelog annotator is a good example: one classifier per commit, then a release writer that aggregates them. Autonomous tool selection and open-ended handoffs are not supported.

Which providers? OpenAI, Anthropic, and Ollama today. Gemini, Bedrock, and Groq are on the list.

Why YAML instead of Python? Because the workflow is the thing you want to read, diff in a PR, and hand to someone who doesn't write Python. When the definition lives inside code, "what does this pipeline actually do?" stops having a quick answer.

How do I run the workflow in production? Currently, SirenSpec has a lightweight Python SDK shipped on install. You can load your workflow into Python and execute in a variety of ways


Final Thoughts

A lot of “agents” in production are really just workflows with retries, branching, and memory layered on top. That realization is what led me to build SirenSpec.

We’re still early at v0.1.1, which makes this a fun stage to experiment in.

I'd love to hear from you:

  • How much of your company's “agent” stack actually deterministic underneath?

  • If you're a non-technical founder, PM, or hobbyist vibecoder, when have you hit a wall building AI workflows or agents?

If any of that sounds familiar, I’d love to hear how your team is approaching it. You can check out SirenSpec on GitHub or browse the docs.