





















Abstract:KV-cache reuse mechanisms increasingly expose priority, duration, offload, routing hints, scheduler modes, and event streams. These mechanisms help preserve reusable prefixes, but they do not by themselves define a portable contract for accepted future-reuse state when resident KV and active live KV cannot both fit. We introduce resident KV claims, a conformance contract that binds future-reuse intent to a materialization predicate, lifecycle state, active/resident feasibility outcome, and claim-level telemetry. In controlled vLLM allocator probes, a 60-block resident claim and a 70-block active prefill exceed an 80-block usable KV pool. Write no-admit prevents the active request from becoming future reusable state, but it still allows active allocation to evict residents from the shared pool. A minimal vLLM prototype shows that hard protected resident claims convert this failure mode into scheduler-visible active refusal with direct blocking-claim attribution. The result is not a production speedup or a new cache-replacement algorithm. It is a runtime contract that turns unreported resident loss into reconstructable active/resident arbitration. A companion MicroRuntime and vLLM litmus suite distinguish ordinary eviction, soft priority, write no-admit, accepted hard claims, materialization failure, demotion, expiry, active refusal, and trace-level outcome reconstruction.
| Comments: | 20 pages, 4 figures; reproducibility artifacts linked in Appendix A |
| Subjects: | Distributed, Parallel, and Cluster Computing (cs.DC) |
| Cite as: | arXiv:2605.24259 [cs.DC] |
| (or arXiv:2605.24259v1 [cs.DC] for this version) | |
| https://doi.org/10.48550/arXiv.2605.24259 arXiv-issued DOI via DataCite (pending registration) |
From: Lukas Stepanek [view email]
[v1]
Fri, 22 May 2026 22:25:31 UTC (100 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。