惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
小众软件
小众软件
美团技术团队
腾讯CDC
C
CERT Recently Published Vulnerability Notes
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 三生石上(FineUI控件)
Security Latest
Security Latest
T
Threat Research - Cisco Blogs
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Spread Privacy
Spread Privacy
Cloudbric
Cloudbric
T
Tor Project blog
Google Online Security Blog
Google Online Security Blog
SecWiki News
SecWiki News
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Webroot Blog
Webroot Blog
W
WeLiveSecurity
Google DeepMind News
Google DeepMind News
Latest news
Latest news
P
Privacy International News Feed
P
Privacy & Cybersecurity Law Blog
Attack and Defense Labs
Attack and Defense Labs
T
Threatpost
N
News and Events Feed by Topic
Know Your Adversary
Know Your Adversary
www.infosecurity-magazine.com
www.infosecurity-magazine.com
The Last Watchdog
The Last Watchdog
P
Proofpoint News Feed
L
LINUX DO - 最新话题
Recent Commits to openclaw:main
Recent Commits to openclaw:main
L
LINUX DO - 热门话题
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
The Hacker News
The Hacker News
Cyberwarzone
Cyberwarzone
Project Zero
Project Zero
S
Security Affairs
T
The Exploit Database - CXSecurity.com
C
Cybersecurity and Infrastructure Security Agency CISA
Scott Helme
Scott Helme
V
Vulnerabilities – Threatpost
I
Intezer
S
Security @ Cisco Blogs
TaoSecurity Blog
TaoSecurity Blog
PCI Perspectives
PCI Perspectives
Schneier on Security
Schneier on Security
Hugging Face - Blog
Hugging Face - Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Jina AI
Jina AI

cs.CV updates on arXiv.org

Demographic and Linguistic Bias Evaluation in Omnimodal Language Models Cross-Cultural Value Awareness in Large Vision-Language Models From UAV Imagery to Agronomic Reasoning: A Multimodal LLM Benchmark for Plant Phenotyping ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories PhysInOne: Visual Physics Learning and Reasoning in One Suite Through Their Eyes: Fixation-aligned Tuning for Personalized User Emulation Neural Distribution Prior for LiDAR Out-of-Distribution Detection Adding Another Dimension to Image-based Animal Detection Long-SCOPE: Fully Sparse Long-Range Cooperative 3D Perception CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation FIRE-CIR: Fine-grained Reasoning for Composed Fashion Image Retrieval Detecting Diffusion-generated Images via Dynamic Assembly Forests Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection Domain-generalizable Face Anti-Spoofing with Patch-based Multi-tasking and Artifact Pattern Conversion Dynamic Class-Aware Active Learning for Unbiased Satellite Image Segmentation Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift Degradation-Robust Fusion: An Efficient Degradation-Aware Diffusion Framework for Multimodal Image Fusion in Arbitrary Degradation Scenarios Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS) MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification BIAS: A Biologically Inspired Algorithm for Video Saliency Detection DeFakeQ: Enabling Real-Time Deepfake Detection on Edge Devices via Adaptive Bidirectional Quantization Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs CatalogStitch: Dimension-Aware and Occlusion-Preserving Object Compositing for Catalog Image Generation Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning SenBen: Sensitive Scene Graphs for Explainable Content Moderation Towards Responsible Multimodal Medical Reasoning via Context-Aligned Vision-Language Models R2G: A Multi-View Circuit Graph Benchmark Suite from RTL to GDSII State Space Models are Effective Sign Language Learners: Exploiting Phonological Compositionality for Vocabulary-Scale Recognition Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring Deep Learning-Based Tracking and Lineage Reconstruction of Ligament Breakup Unified Multimodal Uncertain Inference EfficientSign: An Attention-Enhanced Lightweight Architecture for Indian Sign Language Recognition InsEdit: Towards Instruction-based Visual Editing via Data-Efficient Video Diffusion Models Adaptation 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding On Semiotic-Grounded Interpretive Evaluation of Generative Art Generative 3D Gaussian Splatting for Arbitrary-ResolutionAtmospheric Downscaling and Forecasting From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity ViSAGE @ NTIRE 2026 Challenge on Video Saliency Prediction Needle in a Haystack: One-Class Representation Learning for Detecting Rare Malignant Cells in Computational Cytology A Semi-Automated Framework for 3D Reconstruction of Medieval Manuscript Miniatures Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search R3PM-Net: Real-time, Robust, Real-world Point Matching Network B-MoE: A Body-Part-Aware Mixture-of-Experts "All Parts Matter" Approach to Micro-Action Recognition FDIF: Formula-Driven supervised Learning with Implicit Functions for 3D Medical Image Segmentation CausalVAD: De-confounding End-to-End Autonomous Driving via Causal Intervention Measurement-Consistent Langevin Corrector for Stabilizing Latent Diffusion Inverse Problem Solvers When & How to Write for Personalized Demand-aware Query Rewriting in Video Search Relational Visual Similarity OmniPrism: Learning Disentangled Visual Concept for Image Generation
Edge-preserving noise for diffusion models
Jente Vandersanden, Sascha Holl, Xingchang Huang, Gurprit Singh · 2024-10-02 · via cs.CV updates on arXiv.org

Classical diffusion models typically rely on isotropic Gaussian noise, treating all regions uniformly and overlooking structural information important for high-quality generation. We introduce an edge-preserving diffusion process that generalizes isotropic models via a hybrid noise scheme with an edge-aware scheduler that smoothly transitions from edge-preserving to isotropic noise. This enables the model to capture fine structural details while generally maintaining global performance. We evaluate the impact of structure-aware noise in both diffusion and flow-matching frameworks, and show that existing isotropic models can be efficiently fine-tuned with edge-preserving noise, making our framework practical for adapting pre-trained systems. Beyond unconditional generation, our method particularly shows improvements in structure-guided tasks such as stroke-to-image synthesis, improving robustness and perceptual quality, as evidenced by consistent improvements across FID, KID, and CLIP-score.