惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
LangChain Blog
Security Latest
Security Latest
P
Proofpoint News Feed
GbyAI
GbyAI
PCI Perspectives
PCI Perspectives
博客园 - Franky
N
Netflix TechBlog - Medium
博客园_首页
WordPress大学
WordPress大学
K
Kaspersky official blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Vercel News
Vercel News
T
Threatpost
The Hacker News
The Hacker News
H
Help Net Security
S
Securelist
Recent Announcements
Recent Announcements
腾讯CDC
T
Tailwind CSS Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Engineering at Meta
Engineering at Meta
C
Cisco Blogs
V
V2EX
C
Check Point Blog
S
Schneier on Security
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
B
Blog RSS Feed
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Jina AI
Jina AI
M
MIT News - Artificial intelligence
T
Threat Research - Cisco Blogs
博客园 - 叶小钗
A
Arctic Wolf
AWS News Blog
AWS News Blog
Latest news
Latest news
Martin Fowler
Martin Fowler
Recorded Future
Recorded Future
Last Week in AI
Last Week in AI
The GitHub Blog
The GitHub Blog
小众软件
小众软件
B
Blog
aimingoo的专栏
aimingoo的专栏
C
Cyber Attacks, Cyber Crime and Cyber Security
V
Visual Studio Blog
P
Palo Alto Networks Blog
Spread Privacy
Spread Privacy

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything BFF模式详解:构建前后端协同的中间层 I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
An Ash-Inspired DDD Framework for Python: Derive Everything from the Domain Model
Sahil Pohare · 2026-05-24 · via DEV Community

I've been building PingerAgents — a multitenant AI agent orchestration platform. Along the way I ended up writing something that felt worth writing up on its own: a domain-driven design framework for Python that derives persistence, durable execution handlers, and tenant isolation from a single resource class.

It's inspired by Elixir's Ash framework. This is a writeup of what it is, how it works, and the mistakes made building it.


The Problem with Backend Glue

Most backend code isn't business logic. It's wiring. You write a domain object, then you write:

  • A SQLAlchemy model to persist it
  • A repository with upsert semantics
  • HTTP handlers or message queue consumers to receive commands
  • Tenant filtering on every query
  • Retry logic, idempotency checks, concurrency controls

Then you repeat this for every entity. The business logic — the part that actually matters — is a thin layer inside a thick shell of infrastructure glue.

The observation that drives this framework: if the domain model is expressive enough, all of that can be derived.


The Framework

Resources

A Resource subclass is simultaneously a SQLAlchemy model, a upsert repository, and a Restate VirtualObject. You write one class:

from ironbridge.shared.framework import Resource, ActionKind, action
from ironbridge.shared.framework.effects import ActionContext

class Widget(Resource):
    class Meta:
        tenant_scoped  = True   # inject tenant_id, enforce via Postgres RLS
        restate_object = True   # derive a Restate VirtualObject

    __tablename__ = "widgets"

    id         : Mapped[str]      = mapped_column(String, primary_key=True, default=_cuid)
    name       : Mapped[str]      = mapped_column(String, nullable=False)
    status     : Mapped[str]      = mapped_column(String, default="ACTIVE")
    created_at : Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)

    @action(kind=ActionKind.CREATE)
    def create(self, name: str) -> "Widget":
        self.name = name
        return self

    @action(kind=ActionKind.UPDATE)
    def deactivate(self) -> "Widget":
        if self.status == "INACTIVE":
            raise ValueError("Already inactive")
        self.status = "INACTIVE"
        return self

    @action(kind=ActionKind.READ)
    def get(self) -> WidgetView:
        return WidgetView(id=self.id, name=self.name, status=self.status)

Enter fullscreen mode Exit fullscreen mode

From this, the framework derives at startup:

Artifact Source
SQLAlchemy ORM model Mapped[] column declarations — standard SQLAlchemy, no custom wrappers
tenant_id column + Postgres RLS policy Meta.tenant_scoped = True
Upsert repository ORM model
Restate VirtualObject + one handler per action Meta.restate_object = True + @action
Exclusive vs shared handler concurrency ActionKind
Effect execution after DB write ActionContext

No code generation. No separate schema files. One class, everything derived at import time.

ResourceMeta: the metaclass

ResourceMeta extends SQLAlchemy's DeclarativeBase metaclass and runs at class definition time. It does three things:

  1. Injects tenancy columns. If Meta.tenant_scoped = True, it injects tenant_id (or whatever Meta.tenancy_key is) as a mapped_column with server_default = current_setting('app.tenant_id', true). Domain code never declares this column.

  2. Collects actions. It scans the class namespace for methods decorated with @action and builds cls.__actions__: dict[str, ActionMeta]. Inherited actions are included; methods that shadow parent actions without @action are excluded.

  3. Registers the class globally. The global resource registry maps class names to classes so the derive layer can find them at startup without circular imports.

class ResourceMeta(type(Base)):
    def __new__(mcs, name, bases, namespace):
        # collect @action methods into cls.__actions__
        # parse cls.Meta into cls.__meta__
        # inject tenant_id column if tenant_scoped
        # register in global registry
        ...

Enter fullscreen mode Exit fullscreen mode

ActionKind drives DB and concurrency behaviour

The action kind tells the framework what to do with the return value and how to configure the Restate handler:

Kind DB op Restate concurrency
CREATE INSERT + auto-save returned resource exclusive
UPDATE UPSERT + auto-save returned resource exclusive
DESTROY DELETE by key exclusive
ACTION domain controls all writes via ActionContext exclusive
READ no write shared (concurrent)
STREAM no write shared (concurrent)

CREATE and UPDATE auto-save the returned Resource — the framework calls repo.save(result). ACTION hands full control to domain code; the framework only executes the collected effects after the DB write. READ and STREAM get Restate shared handlers so concurrent reads don't block each other.

The action body rules are strict: no I/O, no raw SQL, no external calls. Validate, mutate, return. The framework handles persistence.

# Good
@action(kind=ActionKind.UPDATE)
def deactivate(self) -> "Widget":
    if self.status == "INACTIVE":
        raise ValueError("Already inactive")
    self.status = "INACTIVE"
    return self

# Bad — I/O in action body
@action(kind=ActionKind.UPDATE)
def deactivate(self) -> "Widget":
    db.execute(...)       # don't — framework handles persistence
    requests.post(...)    # don't — use ActionContext for side effects
    return self

Enter fullscreen mode Exit fullscreen mode

Effects are data, not calls

Domain code never imports Restate. Side effects are declared via ActionContext, a plain Python object passed into ACTION handlers:

@action(kind=ActionKind.ACTION)
def add_message(self, action_ctx: ActionContext, content: dict, participant_id: str, ...) -> Message:
    msg = Message(
        id=_cuid(),
        thread_id=self.id,
        content=content,
        participant_id=participant_id,
    )

    # If this is a human message, start an agent run
    if participant_type == "HUMAN":
        run_id = _cuid()
        action_ctx.send_workflow("AgentRun", key=run_id, arg={
            "run_id": run_id,
            "thread_id": self.id,
            "tenant_id": self.tenant_id,
        })

    # Fan out to all channels bound to this thread
    for channel_id in resolve_channels_for_thread(self.id, self.tenant_id):
        action_ctx.send_after(
            "ChannelDelivery", "deliver", key=channel_id,
            factory=lambda result, cid=channel_id: {
                "channel_id": cid,
                "message": {"position": result["position"], "content": content, ...},
            },
        )

    return msg

Enter fullscreen mode Exit fullscreen mode

ActionContext has three methods:

  • send(service, handler, key, arg) — fire-and-forget to a VirtualObject handler
  • send_after(service, handler, key, factory) — fire-and-forget, but arg built from the action result after the DB write
  • send_workflow(service, key, arg, handler) — start or signal a Restate Workflow

send_after is the subtle one. The message position field is assigned by the framework's position counter after ctx.run() — the domain code can't know it when constructing the effect. send_after takes a factory function that receives the serialized result dict after ctx.run() returns. The derive layer (restate.py) calls factory(result) generically — it contains zero knowledge of ChannelDelivery, position fields, or why they matter.

To make this concrete, here's what the derive layer does after every ACTION handler:

# Inside _attach_handler, after ctx.run() returns the serialized result:
result = await ctx.run(action_name, _run)

# Update position counter in Restate state (for add_message-style actions)
if is_positional and next_pos is not None:
    ctx.set("position", next_pos)

# Execute effects collected by ActionContext during _run
if action_ctx:
    for effect in action_ctx.effects:
        if isinstance(effect, DeferredSendEffect):
            arg = effect.factory(result)   # factory receives the result here
            ctx.generic_send(effect.service, effect.handler, json.dumps(arg).encode(), key=effect.key)
        elif isinstance(effect, SendEffect):
            ctx.generic_send(effect.service, effect.handler, json.dumps(effect.arg).encode(), key=effect.key)
        elif isinstance(effect, WorkflowEffect):
            ctx.workflow_send(handler_fn, key=effect.key, arg=arg)

Enter fullscreen mode Exit fullscreen mode

result is the dict that came back from ctx.run() — which includes position because the framework assigned it inside _run before returning. The factory sees the fully-populated result. The derive layer calls factory(result) generically; it doesn't know what fields the factory reads.

Effects are data. The derive layer executes them. Domain code is insulated from infrastructure.

derive_virtual_object: fully generic

derive_virtual_object(resource_cls) reads cls.__actions__ and cls.__meta__ and generates a Restate VirtualObject with correct handler concurrency, position counters, idempotency, and effect execution. It contains no domain knowledge:

def derive_virtual_object(resource_cls: type[Resource]) -> restate.VirtualObject:
    obj = restate.VirtualObject(resource_cls.__name__)

    for action_name, action_meta in resource_cls.__actions__.items():
        _attach_handler(obj, resource_cls, action_name, action_meta)

    # If resource has add_message, inject queue management handlers
    if "add_message" in resource_cls.__actions__:
        _attach_queue_handlers(obj, resource_cls.__name__)

    return obj

Enter fullscreen mode Exit fullscreen mode

Each generated handler:

  1. Extracts auth/tenant from the request
  2. Recovers the position counter if needed (for add_message-style actions)
  3. Calls ctx.run(action_name, _run) where _run opens a tenant session, loads the resource by key, calls the action method, and saves if needed
  4. Updates the position counter in Restate state
  5. Executes all collected effects via _execute_effects

The _run callback is a pure function — no Restate operations inside it, no HTTP calls. Everything that touches Restate happens outside the callback.

Tenant isolation is structural

tenant_scoped = True triggers a three-layer enforcement:

Layer 1 — Column injection. ResourceMeta injects tenant_id with server_default = current_setting('app.tenant_id', true). If application code forgets to set it, Postgres fills it from the session variable.

Layer 2 — Session variable. tenant_session("tenant-abc") is a context manager that sets SET LOCAL app.tenant_id = :tid on the connection before any query.

Layer 3 — RLS policy. Alembic migration creates CREATE POLICY tenant_isolation ON widgets USING (tenant_id = current_setting('app.tenant_id', true)).

The consequence: a missing WHERE tenant_id = ? clause in application code returns zero rows, not all rows. Bugs fail closed. There is no way to accidentally leak cross-tenant data through a forgotten filter.

with tenant_session("tenant-abc") as db:
    repo = SqlAlchemyRepository(db, Widget)
    widgets = repo.list()   # no WHERE clause — RLS filters automatically

Enter fullscreen mode Exit fullscreen mode

All writes are upserts

SqlAlchemyRepository.save() always issues INSERT ... ON CONFLICT DO UPDATE. This is not a stylistic preference — it's required for correctness with durable execution.

When Restate replays a journaled step after a crash, the ctx.run() callback re-executes. A bare INSERT raises a duplicate-key error on replay. An upsert is idempotent.

For some resources the conflict target isn't the primary key. Message has a natural key (thread_id, idempotency_key) — two messages in the same thread with the same idempotency key are the same message. After a Restate purge, workflows replay with new generated message IDs but the same idempotency keys. ON CONFLICT (id) DO UPDATE would succeed (new ID, no conflict) and then violate the unique constraint. The correct target is the natural key with DO NOTHING:

class Message(Resource):
    class Meta:
        conflict_columns = ("thread_id", "idempotency_key")
        conflict_action  = "nothing"

Enter fullscreen mode Exit fullscreen mode

ResourceMeta parses these and the repository uses them automatically.


What I Tried First and Deleted

Custom field types. Early version had fields.py with CuidField, StringField, DateTimeField and a separate derive/orm.py that emitted SQLAlchemy models from those field definitions. It was reimplementing SQLAlchemy, badly. Two files, ~300 lines, providing worse type safety than native Mapped[]. Deleted both. Resource now directly inherits from DeclarativeBase.

Content hash idempotency keys. Tried auto-generating idempotency keys from a hash of message content to remove them from the domain API. Problem: two different senders can produce identical content. A hash-based key would deduplicate messages from different senders — wrong semantics. Reverted to caller-supplied keys.

Application-layer tenant filters. Started with filter_by(tenant_id=...) in every repository method. Moved to RLS. The difference: application filters fail open (missing filter = all rows); RLS fails closed (missing SET LOCAL = empty session variable = zero rows). One enforcement point, structurally correct.

Single ConversationWorkflow per thread. Tried making each thread a Restate Workflow that persisted conversation state in Restate's state store. Blocked by a Restate SDK bug: WorkflowSharedContext.get/set fails when the handler input is a Pydantic model — the SDK tries to serialize the input during journal bookkeeping and fails with not JSON serializable. Reverted to Thread VirtualObject (for message ordering) + AgentRun Workflow (for agent execution).


The Bugs That Taught Me Things

SQLAlchemy relationships inflate the journal. _serialize() originally walked the full object graph including relationship collections. A Thread with 200 messages would serialize all 200 messages into the journal entry for every single add_message call. This caused Pusher 413 errors (payload too large) and made journal entries balloon in size. Fix: skip relationship collections in _serialize() entirely. Handler return values don't need related objects — they're not part of the action contract.

ON CONFLICT (id) breaks after Restate purge. After clearing Restate state, workflows replay. If the replay generates a new message ID but the same idempotency key, ON CONFLICT (id) DO UPDATE has no conflict on the new ID — INSERT proceeds, then hits the UNIQUE(thread_id, idempotency_key) constraint. Fix: ON CONFLICT (thread_id, idempotency_key) DO NOTHING. If the DB has the message already (it's the source of truth), skip silently.

httpx.post inside a workflow handler deadlocks. Early code wrote error messages by calling httpx.post to the Restate ingress from inside the workflow handler. This caused a deadlock: the workflow was executing inside Restate's handler loop, and synchronously calling back into Restate ingress blocked. _run_done never fired, thread queues blocked permanently. Fix: all sends from workflow handlers go through ctx.generic_send — fire-and-forget, non-blocking, journaled.

Dead VirtualObject state after Restate purge. After podman compose down -v, Restate state is wiped but the Thread VirtualObject's active_run_id was set in its state before the purge. _run_done will never fire because the workflow no longer exists. Without a liveness check, the thread queue blocks forever on every purge. Fix: _enqueue_run calls ctx.workflow_call(status_fn, key=active_run_id) before assuming a run is live. If status is not "running", clear state and fire immediately.


The Agentic Layer

The framework's primary value is as a DDD toolkit — any Python service combining durable execution with a relational database can use it. But for Ironbridge specifically, the framework underpins an agentic execution layer.

The agent contract is minimal:

class BaseAgent(ABC):
    @abstractmethod
    async def run(self, ctx: AgentContext) -> None: ...

Enter fullscreen mode Exit fullscreen mode

AgentContext wraps Restate's WorkflowContext and exposes domain-level primitives: durable steps, history fetch, message write, human-in-the-loop suspension. Agents import nothing from Restate or SQLAlchemy.

class WeatherAgent(BaseAgent):
    async def run(self, ctx: AgentContext) -> None:
        history = await ctx.step("fetch_history", ctx.get_history)
        response = await ctx.step("llm_call_0", lambda: call_llm(history))
        ctx.write_message({"version": 1, "parts": [{"type": "text", "text": response}]})

agent_registry.register("weather", WeatherAgent)

Enter fullscreen mode Exit fullscreen mode

Each ctx.step() is journaled. Crash between steps — replay picks up where it left off. Human-in-the-loop is HITL named promises (the agent suspends, a message appears in the thread, the human replies, the promise resolves, the agent resumes). Cancellation is a durable promise peeked before each step — when a new user message arrives, the active run is cancelled at the next step boundary.

The framework handles all of this. The agent is just domain logic.


Guiding Principles

These held up over 45 architectural decisions:

  1. Postgres is the source of truth. Restate journals execution intent, not state. All reads go to Postgres.
  2. Tenant isolation is structural. RLS at the DB layer. A missing filter returns zero rows.
  3. Domain has no infrastructure imports. Effects are data declared via ActionContext. Infrastructure executes them.
  4. All writes are upserts. Durable execution replays callbacks. Idempotency must be structural, not optional.
  5. Effects execute after the DB write, durably. Postgres and Restate share no distributed transaction manager — true atomicity across both is impossible without a 2PC or outbox pattern. What the framework does instead: effects are fired inside the Restate handler after ctx.run() returns, and those sends are themselves journaled by Restate. If the process crashes between the DB write and the send, Restate replays the handler — ctx.run() returns its cached result (no double-write), and the sends fire. The DB write is the source of truth; the effect delivery is guaranteed by Restate's replay. The window of inconsistency is a process crash between ctx.run() completing and the next Restate journal entry — which Restate closes by design.

Stack

Python, FastAPI, Restate, Postgres (RLS), Alembic, Hypercorn (HTTP/2 required for Restate), Podman Compose.

The framework layer (shared/framework/ and shared/derive/) has no domain knowledge. The platform layer (platform/) has no Restate imports except through the framework. Concrete agents and adapters (services/) have neither.

45 ADRs in docs/decisions.md for anyone who wants to go deeper.