


























@@ -0,0 +1,122 @@
1+# Async Exec Duplicate Completion Investigation
2+3+## Scope
4+5+- Session: `agent:main:telegram:group:-1003774691294:topic:1`
6+- Symptom: the same async exec completion for session/run `keen-nexus` was recorded twice in LCM as user turns.
7+- Goal: identify whether this is most likely duplicate session injection or plain outbound delivery retry.
8+9+## Conclusion
10+11+Most likely this is **duplicate session injection**, not a pure outbound delivery retry.
12+13+The strongest gateway-side gap is in the **node exec completion path**:
14+15+1. A node-side exec finish emits `exec.finished` with the full `runId`.
16+2. Gateway `server-node-events` converts that into a system event and requests a heartbeat.
17+3. The heartbeat run injects the drained system event block into the agent prompt.
18+4. The embedded runner persists that prompt as a new user turn in the session transcript.
19+20+If the same `exec.finished` reaches the gateway twice for the same `runId` for any reason (replay, reconnect duplicate, upstream resend, duplicated producer), OpenClaw currently has **no idempotency check keyed by `runId`/`contextKey`** on this path. The second copy will become a second user message with the same content.
21+22+## Exact Code Path
23+24+### 1. Producer: node exec completion event
25+26+- `src/node-host/invoke.ts:340-360`
27+- `sendExecFinishedEvent(...)` emits `node.event` with event `exec.finished`.
28+- Payload includes `sessionKey` and full `runId`.
29+30+### 2. Gateway event ingestion
31+32+- `src/gateway/server-node-events.ts:574-640`
33+- Handles `exec.finished`.
34+- Builds text:
35+- `Exec finished (node=..., id=<runId>, code ...)`
36+- Enqueues it via:
37+- `enqueueSystemEvent(text, { sessionKey, contextKey: runId ? \`exec:${runId}\` : "exec", trusted: false })`
38+- Immediately requests a wake:
39+- `requestHeartbeatNow(scopedHeartbeatWakeOptions(sessionKey, { reason: "exec-event" }))`
40+41+### 3. System event dedupe weakness
42+43+- `src/infra/system-events.ts:90-115`
44+- `enqueueSystemEvent(...)` only suppresses **consecutive duplicate text**:
45+- `if (entry.lastText === cleaned) return false`
46+- It stores `contextKey`, but does **not** use `contextKey` for idempotency.
47+- After drain, duplicate suppression resets.
48+49+This means a replayed `exec.finished` with the same `runId` can be accepted again later, even though the code already had a stable idempotency candidate (`exec:<runId>`).
50+51+### 4. Wake handling is not the primary duplicator
52+53+- `src/infra/heartbeat-wake.ts:79-117`
54+- Wakes are coalesced by `(agentId, sessionKey)`.
55+- Duplicate wake requests for the same target collapse to one pending wake entry.
56+57+This makes **duplicate wake handling alone** a weaker explanation than duplicate event ingestion.
58+59+### 5. Heartbeat consumes the event and turns it into prompt input
60+61+- `src/infra/heartbeat-runner.ts:535-574`
62+- Preflight peeks pending system events and classifies exec-event runs.
63+- `src/auto-reply/reply/session-system-events.ts:86-90`
64+- `drainFormattedSystemEvents(...)` drains the queue for the session.
65+- `src/auto-reply/reply/get-reply-run.ts:400-427`
66+- The drained system event block is prepended into the agent prompt body.
67+68+### 6. Transcript injection point
69+70+- `src/agents/pi-embedded-runner/run/attempt.ts:2000-2017`
71+- `activeSession.prompt(effectivePrompt)` submits the full prompt to the embedded PI session.
72+- That is the point where the completion-derived prompt becomes a persisted user turn.
73+74+So once the same system event is rebuilt into the prompt twice, duplicate LCM user messages are expected.
75+76+## Why plain outbound delivery retry is less likely
77+78+There is a real outbound failure path in the heartbeat runner:
79+80+- `src/infra/heartbeat-runner.ts:1194-1242`
81+- The reply is generated first.
82+- Outbound delivery happens later via `deliverOutboundPayloads(...)`.
83+- Failure there returns `{ status: "failed" }`.
84+85+However, for the same system event queue entry, this alone is **not sufficient** to explain the duplicate user turns:
86+87+- `src/auto-reply/reply/session-system-events.ts:86-90`
88+- The system event queue is already drained before outbound delivery.
89+90+So a channel send retry by itself would not recreate the exact same queued event. It could explain missing/failed external delivery, but not by itself a second identical session user message.
91+92+## Secondary, lower-confidence possibility
93+94+There is a full-run retry loop in the agent runner:
95+96+- `src/auto-reply/reply/agent-runner-execution.ts:741-1473`
97+- Certain transient failures can retry the whole run and resubmit the same `commandBody`.
98+99+That can duplicate a persisted user prompt **within the same reply execution** if the prompt was already appended before the retry condition triggered.
100+101+I rank this lower than duplicate `exec.finished` ingestion because:
102+103+- the observed gap was around 51 seconds, which looks more like a second wake/turn than an in-process retry;
104+- the report already mentions repeated message send failures, which points more toward a separate later turn than an immediate model/runtime retry.
105+106+## Root Cause Hypothesis
107+108+Highest-confidence hypothesis:
109+110+- The `keen-nexus` completion came through the **node exec event path**.
111+- The same `exec.finished` was delivered to `server-node-events` twice.
112+- Gateway accepted both because `enqueueSystemEvent(...)` does not dedupe by `contextKey` / `runId`.
113+- Each accepted event triggered a heartbeat and was injected as a user turn into the PI transcript.
114+115+## Proposed Tiny Surgical Fix
116+117+If a fix is wanted, the smallest high-value change is:
118+119+- make exec/system-event idempotency honor `contextKey` for a short horizon, at least for exact `(sessionKey, contextKey, text)` repeats;
120+- or add a dedicated dedupe in `server-node-events` for `exec.finished` keyed by `(sessionKey, runId, event kind)`.
121+122+That would directly block replayed `exec.finished` duplicates before they become session turns.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。