
















Adversarial robustness of deep autoencoders (AEs) has received less attention than that of discriminative models, although their compressed latent representations induce ill-conditioned mappings that can amplify small input perturbations and destabilize reconstructions. Existing white-box attacks for AEs, which optimize norm-bounded adversarial perturbations to maximize output damage, often stop at suboptimal attacks. We observe that this limitation stems from vanishing adversarial loss gradients during backpropagation through ill-conditioned layers, caused by near-zero singular values in their Jacobians. To address this issue, we introduce GRILL, a technique that locally restores gradient signals in ill-conditioned layers, enabling more effective norm-bounded attacks. Through extensive experiments across multiple AE architectures, considering both sample-specific and universal attacks under both standard and adaptive attack settings, we show that GRILL significantly increases attack effectiveness, leading to a more rigorous evaluation of AE robustness. Beyond AEs, we provide empirical evidence that modern multimodal architectures with encoder-decoder structures exhibit similar vulnerabilities under GRILL.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。