惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

C
Comments on: Blog
S
Schneier on Security
Microsoft Azure Blog
Microsoft Azure Blog
T
Tor Project blog
V
Visual Studio Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Spread Privacy
Spread Privacy
月光博客
月光博客
罗磊的独立博客
Cisco Talos Blog
Cisco Talos Blog
P
Privacy International News Feed
T
Tenable Blog
阮一峰的网络日志
阮一峰的网络日志
AWS News Blog
AWS News Blog
T
ThreatConnect
博客园 - 三生石上(FineUI控件)
Recorded Future
Recorded Future
Hugging Face - Blog
Hugging Face - Blog
T
Tailwind CSS Blog
博客园 - 叶小钗
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
A
Arctic Wolf
L
LINUX DO - 最新话题
美团技术团队
大猫的无限游戏
大猫的无限游戏
I
Intezer
博客园 - 司徒正美
酷 壳 – CoolShell
酷 壳 – CoolShell
量子位
小众软件
小众软件
T
Threatpost
V
V2EX
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
宝玉的分享
宝玉的分享
The Register - Security
The Register - Security
Project Zero
Project Zero
J
Java Code Geeks
Cyberwarzone
Cyberwarzone
IT之家
IT之家
MyScale Blog
MyScale Blog
T
Threat Research - Cisco Blogs
T
The Blog of Author Tim Ferriss
腾讯CDC
S
SegmentFault 最新的问题
F
Fox-IT International blog
S
Security Archives - TechRepublic
Last Week in AI
Last Week in AI
G
GRAHAM CLULEY
M
MIT News - Artificial intelligence

cs.CV updates on arXiv.org

A Novel Approach for the Counting of Wood Logs Using cGANs and Image Processing Techniques EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution The TIME Machine: On The Power of Motion for Efficient Perception PixIE: Prompted Pixel-Space Low-Light Image Enhancement Spatio-Temporal Similarity Volume Aggregation for Open-Vocabulary Action Recognition Sparser Block-Sparse Attention via Token Permutation PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA CARE: Class-Adaptive Expert Consensus for Reliable Learning with Long-Tailed Noisy Labels B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation Rethinking Transfer Learning for Industrial Inspection: DINOv3 vs. ImageNet Pretraining Across RGB and X-ray Tasks ETCHR: Editing To Clarify and Harness Reasoning Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs Lipschitz Optimization for Formal Verification of Homographies General Hazard Detection DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models PhenoYieldNet: Learning Crop-Aware Phenological Responses for Multi-Crop Yield Prediction Recursive Block-Diagonal Coupling for Resource-Efficient Training of Vision Models STAMBRIDGE: Spectral-Temporal Amplitude-aware Mid-Feature Bridge for EEG Visual Decoding Joint Target-Less Intrinsic and Extrinsic Camera-LiDAR Calibration using Deep Point Correspondences CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering MapGCLR: Geospatial Contrastive Learning of Representations for Online Vectorized HD Map Construction CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework GFSR: Geometric Fidelity and Spatial Refinement for Reliable Lane Detection Semantic-Aware Guided Drone Exploration for Language-Conditioned 3D Indoor Mapping Revitalizing Dense Material Segmentation: Stabilized Vision Transformers and the Generalization Paradox Exploring deep learning for Event-Based Saliency Prediction with a Transformer-based model Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision? EM-Vid: Training-Free Entity-Centric Memory for Efficient and Consistent Multi-Shot Video Generation Do Synthetic Brain MRIs Reliably Improve Tumour Classification? A StyleGAN2-ADA Class-Plane Augmentation Study on BRISC 2025 IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction SLIP-RS: Structured-Attribute Language-Image Pre-Training for Remote Sensing Object Detection Millimeter-wave Imaging for Anthropometric Body Measurement VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations Vision Transformers Need Better Token Interaction DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025 VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images Weierstrass Positional Encoding for Vision Transformers SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion LaMo: Self-Supervised Latent Motion Priors for Physical Realism in Video Generation Beyond Normal References: Discriminative Few-Shot Anomaly Detection Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention ComPose: When to Trust Hands for Object Pose Tracking PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation Extending Deep Event Visual Odometry with Sparse Point-Cloud Export GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation Enhancing Blood Cells Classification using Hybrid Quantum Neural Networks MDS-DETR: DETR with Masked Duplicate Suppressor CoMoGen: COntrollable MOtion Dynamics and Interactions with Mask-Guided Video GENeration Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering DepthAgent: Towards Better Universal Depth Estimation via Sample-wise Expert Selection Decoupling Spatio-Temporal Adapter for Fine-Grained Badminton Action Localization CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs Geo-Align: Video Generation Alignment via Metric Geometry Reward One-Forcing: Towards Stable One-Step Autoregressive Video Generation Online Hand Gesture Recognition Using 3D Convolutional Neural Networks From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset ExpOS: Explainable Open-Surgery Skills Assessment Using 3D Hand Reconstruction Multimodal Distribution Matching for Vision-Language Dataset Distillation RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos DualMem: Bypassing the Objectness Bottleneck for Calibrated Unknown-Stream Filtering in Open-World Object Detection Learning a Particle Dynamics Model with Real-world Videos Scene Reconstruction as Mapping Priors for 3D Detection Calibration-Informative Region Selection for Online LiDAR--Camera Calibration in Agricultural Environments Inconsistency-aware Multimodal Schrödinger Bridge for Deepfake Localization Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering VideoOdyssey: A Benchmark for Ultra-Long-Context and Omni-Modal Video Understanding PhotoFlow: Agentic 3D Virtual Photography Missions Composing People Together: Iterative Pose-Image Generation for Multi-Person Interaction Scenes LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images MuellerPT: Decomposition Driven Pretraining for Dense Learning in Mueller Polarimetry StereoGenBench: A Synthetic Multi-Camera Benchmark for Stereo Generation under Controlled Baseline Regimes ChainFlow-VLA: Causal Flow Planning with Vision-Language Models Machine learning applied to emerald gemstone grading: framework proposal and creation of a public dataset LQ-rPPG: A Label-Quantized Coarse-to-Fine Learning Framework for Remote Physiological Measurement RADAR: Relative Angular Divergence Across Representations HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences MedExpMem: Adapting Experience Memory for Differential Diagnosis FAST-ME: Foundation-aware Adaptive Stopping for Motion Estimation for Efficient IoT Video Analysis
Broken Memories: Detecting and Mitigating Memorization in Diffusion Models with Degraded Generations
Yuanmin Huan · 2026-05-22 · via cs.CV updates on arXiv.org

View PDF HTML (experimental)

Abstract:While diffusion models excel at generating high-quality images, their tendency to memorize training data poses significant privacy and copyright risks. In this work, we for the first time identify that memorization induces internal numerical instability, often manifesting as visually ``broken'' artifacts. Inspired by stability analysis in numerical methods, we introduce empirical stability regions based on latent update norms to quantitatively characterize stable behavior during generation. Leveraging this, we propose a principled, on-the-fly framework for step-wise detection and adaptive mitigation. Our approach suppresses memorization without altering prompts or guidance, thereby preserving semantic fidelity and image quality. Extensive experiments on Stable Diffusion 1.4 demonstrate that our method achieves an AUC $>0.999$ detection performance and a $0.0\%$ memorization rate after mitigation with negligible overhead ($\approx0.01$s per image).
Comments: KDD 2026, extended version
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2605.22050 [cs.CV]
  (or arXiv:2605.22050v2 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2605.22050

arXiv-issued DOI via DataCite

Related DOI: https://doi.org/10.1145/3770855.3817770

DOI(s) linking to related resources

Submission history

From: Yuanmin Huang [view email]
[v1] Thu, 21 May 2026 06:36:59 UTC (23,812 KB)
[v2] Fri, 22 May 2026 16:38:43 UTC (23,830 KB)