惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

SecWiki News
SecWiki News
爱范儿
爱范儿
Martin Fowler
Martin Fowler
V
V2EX
L
LangChain Blog
Engineering at Meta
Engineering at Meta
Microsoft Azure Blog
Microsoft Azure Blog
MyScale Blog
MyScale Blog
N
Netflix TechBlog - Medium
H
Help Net Security
阮一峰的网络日志
阮一峰的网络日志
博客园 - 聂微东
博客园 - 叶小钗
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
G
Google Developers Blog
C
CERT Recently Published Vulnerability Notes
F
Full Disclosure
Apple Machine Learning Research
Apple Machine Learning Research
G
GRAHAM CLULEY
aimingoo的专栏
aimingoo的专栏
MongoDB | Blog
MongoDB | Blog
C
Cybersecurity and Infrastructure Security Agency CISA
E
Exploit-DB.com RSS Feed
V
Visual Studio Blog
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
S
Security @ Cisco Blogs
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
S
SegmentFault 最新的问题
B
Blog RSS Feed
The Hacker News
The Hacker News
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
D
DataBreaches.Net
博客园 - 三生石上(FineUI控件)
小众软件
小众软件
Jina AI
Jina AI
W
WeLiveSecurity
Vercel News
Vercel News
T
The Blog of Author Tim Ferriss
T
Tor Project blog
U
Unit 42
Hacker News - Newest:
Hacker News - Newest: "LLM"
A
Arctic Wolf
T
Threat Research - Cisco Blogs
博客园 - 【当耐特】
Recorded Future
Recorded Future
B
Blog
F
Fortinet All Blogs
P
Proofpoint News Feed

cs.CV updates on arXiv.org

A Lightweight Multi-Metric No-Reference Image Quality Assessment Framework for UAV Imaging PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction 3DRealHead: Few-Shot Detailed Head Avatar GeoLink: A 3D-Aware Framework Towards Better Generalization in Cross-View Geo-Localization Towards Patient-Specific Deformable Registration in Laparoscopic Surgery Neural 3D Reconstruction of Planetary Surfaces from Descent-Phase Wide-Angle Imagery A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery See&Say: Vision Language Guided Safe Zone Detection for Autonomous Package Delivery Drones PAT-VCM: Plug-and-Play Auxiliary Tokens for Video Coding for Machines Bias at the End of the Score Deep Spatially-Regularized and Superpixel-Based Diffusion Learning for Unsupervised Hyperspectral Image Clustering The Spectrascapes Dataset: Street-view imagery beyond the visible captured using a mobile platform Why MLLMs Struggle to Determine Object Orientations Towards Successful Implementation of Automated Raveling Detection: Effects of Training Data Size, Illumination Difference, and Spatial Shift Right Regions, Wrong Labels: Semantic Label Flips in Segmentation under Correlation Shift SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion Diarization MSGS: Multispectral 3D Gaussian Splatting Multi-Agent Object Detection Framework Based on Raspberry Pi YOLO Detector and Slack-Ollama Natural Language Interface UniBlendNet: Unified Global, Multi-Scale, and Region-Adaptive Modeling for Ambient Lighting Normalization A Multimodal Clinically Informed Coarse-to-Fine Framework for Longitudinal CT Registration in Proton Therapy Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks CausalDisenSeg: A Causality-Guided Disentanglement Framework with Counterfactual Reasoning for Robust Brain Tumor Segmentation Under Missing Modalities Physically-Guided Optical Inversion Enable Non-Contact Side-Channel Attack on Isolated Screens VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning FiRe: Fine-grained Multimodal Reasoning for Enhanced Image Generation ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer Reconstruction of a 3D wireframe from a single line drawing via generative depth estimation Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding A Study of Failure Modes in Two-Stage Human-Object Interaction Detection MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis Event-Adaptive State Transition and Gated Fusion for RGB-Event Object Tracking DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis A 3D SAM-Based Progressive Prompting Framework for Multi-Task Segmentation of Radiotherapy-induced Normal Tissue Injuries in Limited-Data Settings Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision Explainable Fall Detection for Elderly Monitoring via Temporally Stable SHAP in Skeleton-Based Human Activity Recognition Indexing Multimodal Language Models for Large-scale Image Retrieval Rethinking Uncertainty in Segmentation: From Estimation to Decision 4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview SemiFA: An Agentic Multi-Modal Framework for Autonomous Semiconductor Failure Analysis Report Generation Multitasking Embedding for Embryo Blastocyst Grading Prediction (MEmEBG) Graph Propagated Projection Unlearning: A Unified Framework for Vision and Audio Discriminative Models Beyond Attention Scores: SVD-Based Vision Token Pruning for Efficient Vision-Language Models Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference Anthropogenic Regional Adaptation in Multimodal Vision-Language Model ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation Lightweight Low-Light Image Enhancement via Distribution-Normalizing Preprocessing and Depthwise U-Net You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass A Semi-Automated Framework for 3D Reconstruction of Medieval Manuscript Miniatures ViSAGE @ NTIRE 2026 Challenge on Video Saliency Prediction InsEdit: Towards Instruction-based Visual Editing via Data-Efficient Video Diffusion Models Adaptation EfficientSign: An Attention-Enhanced Lightweight Architecture for Indian Sign Language Recognition Unified Multimodal Uncertain Inference State Space Models are Effective Sign Language Learners: Exploiting Phonological Compositionality for Vocabulary-Scale Recognition Towards Responsible Multimodal Medical Reasoning via Context-Aligned Vision-Language Models CatalogStitch: Dimension-Aware and Occlusion-Preserving Object Compositing for Catalog Image Generation DeFakeQ: Enabling Real-Time Deepfake Detection on Edge Devices via Adaptive Bidirectional Quantization BIAS: A Biologically Inspired Algorithm for Video Saliency Detection Degradation-Robust Fusion: An Efficient Degradation-Aware Diffusion Framework for Multimodal Image Fusion in Arbitrary Degradation Scenarios Dynamic Class-Aware Active Learning for Unbiased Satellite Image Segmentation Domain-generalizable Face Anti-Spoofing with Patch-based Multi-tasking and Artifact Pattern Conversion Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation Detecting Diffusion-generated Images via Dynamic Assembly Forests CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation Long-SCOPE: Fully Sparse Long-Range Cooperative 3D Perception Adding Another Dimension to Image-based Animal Detection Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search Through Their Eyes: Fixation-aligned Tuning for Personalized User Emulation FDIF: Formula-Driven supervised Learning with Implicit Functions for 3D Medical Image Segmentation B-MoE: A Body-Part-Aware Mixture-of-Experts "All Parts Matter" Approach to Micro-Action Recognition ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing COREY: Entropy-Guided Runtime Chunk Scheduling for Selective Scan Kernels Multinex: Lightweight Low-light Image Enhancement via Multi-prior Retinex Degradation-Consistent Paired Training for Robust AI-Generated Image Detection Genie 4D: Semantic-Prior-Guided 4D Dynamic Scene Reconstruction Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories Neural Distribution Prior for LiDAR Out-of-Distribution Detection Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS) SenBen: Sensitive Scene Graphs for Explainable Content Moderation Unsupervised Local Plasticity in a Multi-Frequency VisNet Hierarchy 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding On Semiotic-Grounded Interpretive Evaluation of Generative Art Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models Belief-Aware VLM Model for Human-like Reasoning Zero-Shot Quantization via Weight-Space Arithmetic Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning Interpretable Alzheimer's Diagnosis via Multimodal Fusion of Regional Brain Experts Geometry-Aware Cross Modal Alignment for Light Field-LiDAR Semantic Segmentation FedKLPR: KL-Guided Pruning-Aware Federated Learning for Person Re-Identification VRAG: Learning World Models for Interactive Video Generation
Large Scale Image Completion via Co-Modulated Generative Adversarial Networks
Shengyu Zhao, Jonathan Cui, Yilun Sheng, Yue Dong, Xiao Liang, E · 2021-03-19 · via cs.CV updates on arXiv.org

Numerous task-specific variants of conditional generative adversarial networks have been developed for image completion. Yet, a serious limitation remains that all existing algorithms tend to fail when handling large-scale missing regions. To overcome this challenge, we propose a generic new approach that bridges the gap between image-conditional and recent modulated unconditional generative architectures via co-modulation of both conditional and stochastic style representations. Also, due to the lack of good quantitative metrics for image completion, we propose the new Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS), which robustly measures the perceptual fidelity of inpainted images compared to real images via linear separability in a feature space. Experiments demonstrate superior performance in terms of both quality and diversity over state-of-the-art methods in free-form image completion and easy generalization to image-to-image translation. Code is available at https://github.com/zsyzzsoft/co-mod-gan.