






















The order flow involves four services: payment, inventory, fulfillment, notifications. The textbook answer is “wrap it in a distributed transaction.” The textbook is wrong. Two-phase commit (2PC) requires every participating service to support XA or similar distributed-transaction protocols, which Postgres does, MySQL does poorly, and most third-party services (Stripe, SendGrid, your search backend) do not at all.
Real systems use sagas. A saga is a sequence of local transactions, each of which has a compensating action that undoes it. If step 4 fails, you run the compensations for steps 3, 2, 1 in reverse. The transaction is eventually consistent, not atomic, but it is implementable.
This post is the comparison, the two saga implementation patterns (orchestrated vs choreographed), and the compensation rules that hold up in production.
The 2PC protocol:
The properties that make 2PC look attractive: atomic across services, no compensation needed. The properties that make 2PC unworkable in practice:
PREPARE TRANSACTION); your payment provider does not.For a single multi-table commit within a Postgres database, 2PC is fine, but you can usually do that with a regular transaction. For cross-service atomicity, 2PC’s failure modes outweigh its benefits in 95% of real systems.
A saga is a sequence of local transactions. Each step’s local transaction either succeeds or fails. If a later step fails, you run compensating transactions for previous steps in reverse order to undo their effects.
Forward: [pay] [reserve] [ship] [notify]
↓ fails
Compensate: [refund] ← [release] ← (steps 1-2 undone)
The trade: not atomic. There is a window where money is captured but inventory is not reserved. If the system is observed mid-saga, it is in an inconsistent intermediate state. Eventually consistent, after the compensation runs.
In return for that trade: each step is a local transaction in its own service. No XA, no coordinator-blocking, no holding locks across the network. Each service stays loosely coupled.
A central orchestrator drives the saga forward. It calls service A; on success, calls B; on failure, calls A’s compensation. The state of the saga is a row in the orchestrator’s database, updated as steps complete.
type SagaState =
| { phase: 'starting' }
| { phase: 'paid'; paymentId: string }
| { phase: 'reserved'; paymentId: string; reservationId: string }
| { phase: 'shipped'; paymentId: string; reservationId: string; shipmentId: string }
| { phase: 'completed' }
| { phase: 'compensating'; reason: string; lastSuccessful: string }
| { phase: 'failed'; reason: string };
async function runOrderSaga(orderId: string) {
let state: SagaState = { phase: 'starting' };
await persist(orderId, state);
try {
const paymentId = await charge(orderId);
state = { phase: 'paid', paymentId };
await persist(orderId, state);
const reservationId = await reserveInventory(orderId);
state = { phase: 'reserved', paymentId, reservationId };
await persist(orderId, state);
const shipmentId = await ship(orderId);
state = { phase: 'shipped', paymentId, reservationId, shipmentId };
await persist(orderId, state);
await notify(orderId);
state = { phase: 'completed' };
await persist(orderId, state);
} catch (err) {
await compensate(orderId, state, err.message);
}
}
async function compensate(orderId: string, state: SagaState, reason: string) {
if (state.phase === 'shipped' || state.phase === 'reserved') {
await releaseInventory(state.reservationId);
}
if (state.phase === 'shipped' || state.phase === 'reserved' || state.phase === 'paid') {
await refund(state.paymentId);
}
await persist(orderId, { phase: 'failed', reason });
}
Properties:
Costs:
Tools: Temporal, AWS Step Functions, Camunda Zeebe. Each handles persistence, retries, and timeouts for you.
Each service publishes domain events; other services subscribe and react. There is no central coordinator.
[ payment service ] → publishes "OrderPaid" → [ inventory service ]
[ inventory service ] → publishes "InventoryReserved" → [ fulfillment service ]
[ fulfillment service ] → publishes "OrderShipped" → [ notification service ]
Failure path:
[ inventory service ] → publishes "InventoryReservationFailed" → [ payment service ]
→ triggers refund
Properties:
Costs:
Choreography is appealing in theory and painful in practice for sagas with more than 3-4 steps. Most production sagas I have seen end up orchestrated.
Three rules that make compensations correct.
1. Compensations must be idempotent. A retry of refund(paymentId) must not refund twice. Use the payment ID as an idempotency key. Most payment providers support this natively.
2. Compensations must be commutative if possible. If two compensations can run, the order should not matter. In practice, you order them deterministically (reverse of forward order), but defensive programming helps.
3. Compensations cannot fail. If a compensation fails, the saga is in an unrecoverable state and requires human intervention. Design compensations as the simplest, most reliable code in the system. If refund cannot run reliably, escalate: log to a dead-letter queue, alert ops, freeze the saga.
The “cannot fail” rule is harder than it sounds. Some compensations are inherently failable (the third party rejects the refund because it has been more than 90 days). Plan for those: an explicit “manual intervention required” state.
Some actions cannot be undone. Sending an email. Calling an external service that does not support reversal. Triggering a webhook to a partner.
Two strategies:
Make non-compensable steps run last. Order: charge → reserve → ship → notify. If notification was first, you’d have to “un-notify,” which is impossible. Putting it last means compensation only happens for the early steps, which are designed to be reversible.
Pivot transactions. If you must run a non-compensable action mid-saga, design that step as a “pivot.” Beyond it, the saga always continues forward, never compensates. Your saga state machine has to know this.
The orchestrator’s state must be durable. Steps:
This works because each service step is idempotent (you re-call it with the same business key, it returns the existing result if it ran already). Combined with idempotency on the saga side, “did I already do step 3?” is answered by querying the downstream service.
For a Temporal-style orchestrator, the framework handles this for you. For a homegrown orchestrator, the pattern is roughly the same as the outbox pattern: write the next intended step, dispatch it, mark complete.
A saga in flight is multiple service calls scattered across logs. Tracing is mandatory. Each saga gets a trace ID; every service call carries it; the saga’s progression shows up as a single trace in your tracing tool.
For more business-level visibility, store the saga state itself in a table users / ops can query: “show me all order sagas in ‘compensating’ state.” This is what saves you when 50 sagas are stuck.
A practical decision tree:
The right answer is usually #2 or #3. The “we wrote our own choreographed saga across 8 services” case usually ends in tears.
Two-phase commit is a textbook answer that fails in real systems. Sagas, orchestrated or choreographed, are what production uses. Pick orchestration for clarity and choreography for very loose coupling. Make every step idempotent. Design compensations as the most reliable code in the system. Order steps so non-compensable ones come last.
The next time someone says “we need a distributed transaction,” ask “what compensation do we run if step three fails?” The answer is the saga design.
The kind of distributed-systems engineering that turns “we need atomic across services” from a pipe dream into a working saga (orchestrators, compensations, idempotency keys, the metrics that prove it works) is the kind of senior backend skill Yojji’s teams bring to client work.
Yojji is an international custom software development company founded in 2016, with teams across Europe, the US, and the UK. They specialize in the JavaScript ecosystem, cloud platforms, and event-driven backends, including the saga and workflow design that decides whether multi-service flows stay correct as the system grows.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。