

















This paper has been withdrawn by Ziyang Liu
No PDF available, click to view other formats
Abstract:Hosted-LLM providers have a silent-substitution incentive: advertise a stronger model while serving cheaper replies. Probe-after-return schemes such as SVIP leave a parallel-serve side-channel, since a dishonest provider can route the verifier's probe to the advertised model while serving ordinary users from a substitute. We propose a commit-open protocol that closes this gap. Before any opening request, the provider commits via a Merkle tree to a per-position sparse-autoencoder (SAE) feature-trace sketch of its served output at a published probe layer. A verifier opens random positions, scores them against a public named-circuit probe library calibrated with cross-backend noise, and decides with a fixed-threshold joint-consistency z-score rule. We instantiate the protocol on three backbones -- Qwen3-1.7B, Gemma-2-2B, and a 4.5x scale-up to Gemma-2-9B with a 131k-feature SAE. Of 17 attackers spanning same-family lifts, cross-family substitutes, and rank-<=128 adaptive LoRA, all are rejected at a shared, scale-stable threshold; the same attackers all evade a matched SVIP-style parallel-serve baseline. A white-box end-to-end attack that backpropagates through the frozen SAE encoder does not close the margin, and a feature-forgery attacker that never runs M_hon is bounded in closed form by an intrinsic-dimension argument. Commitment adds <=2.1% to forward-only wall-clock at batch 32.
| Comments: | We identified inaccuracies in the security analysis: the closed-form intrinsic-dimension lower bound on the feature-forgery attacker (Proposition 4.2, Section 4, Appendix V) and the cross-backend noise calibration for the joint z-score threshold (Section 5.1, Table 2). These affect the claimed attack-resistance guarantees. We are withdrawing the paper to correct them before resubmission |
| Subjects: | Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2604.18179 [cs.CR] |
| (or arXiv:2604.18179v3 [cs.CR] for this version) | |
| https://doi.org/10.48550/arXiv.2604.18179 arXiv-issued DOI via DataCite |
From: Ziyang Liu [view email]
[v1]
Mon, 20 Apr 2026 12:34:56 UTC (503 KB)
[v2]
Sat, 23 May 2026 12:11:38 UTC (1 KB) (withdrawn)
[v3]
Tue, 26 May 2026 01:05:19 UTC (1 KB) (withdrawn)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。