

















Abstract:Online surgical phase recognition (SPR) underpins context-aware operating-room systems and requires committing to a prediction at every frame from past context alone. Surgical video poses three demands that natural-video recognizers do not jointly address: procedures span tens of thousands of frames, time flows non-uniformly as long routine stretches are punctuated by brief phase-defining transitions, and the visual domain is narrow so backbone features are strongly correlated across channels. Existing recognizers either let per-frame cost grow with elapsed length, or hold cost bounded but advance state at a uniform rate with channel-independent dynamics, leaving the latter two demands unaddressed. We present SurgicalMamba, a causal SPR model built on Mamba2's structured state-space duality (SSD) that holds per-frame cost at O(d). It introduces three SSD-compatible components that jointly address these demands: a dual-path SSD block that separates long- and short-term regimes at the level of recurrent state; intensity-modulated stepping, a continuous-time time-warp that adapts the slow path's effective rate to phase-relevant information; and state regramming, a per-chunk Cayley rotation that opens cross-channel mixing in the otherwise axis-aligned SSM recurrence. The learned rotation planes inherit a phase-aligned structure without any direct supervision, offering an interpretable internal signature of surgical workflow. Across seven public SPR benchmarks, SurgicalMamba reaches state-of-the-art accuracy and phase-level Jaccard under strict online evaluation: 94.6%/82.7% on Cholec80 (+0.7 pp/+2.2 pp over the strongest prior) and 89.5%/68.9% on AutoLaparo (+1.7 pp/+2.0 pp), at 238.74 fps on a single GPU. Ablations isolate the contribution of each component. The code is publicly available at this https URL.
| Comments: | 28 pages, 7 figures, 10 tables; Code available at this https URL |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) |
| ACM classes: | I.2.10; I.4.8; J.3 |
| Cite as: | arXiv:2605.14889 [cs.CV] |
| (or arXiv:2605.14889v3 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.14889 arXiv-issued DOI via DataCite |
From: Sukju Oh [view email]
[v1]
Thu, 14 May 2026 14:34:55 UTC (4,872 KB)
[v2]
Mon, 18 May 2026 02:24:40 UTC (4,219 KB)
[v3]
Sat, 23 May 2026 05:58:08 UTC (5,159 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。