惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.CV updates on arXiv.org

A Novel Approach for the Counting of Wood Logs Using cGANs and Image Processing Techniques EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution The TIME Machine: On The Power of Motion for Efficient Perception PixIE: Prompted Pixel-Space Low-Light Image Enhancement Spatio-Temporal Similarity Volume Aggregation for Open-Vocabulary Action Recognition Sparser Block-Sparse Attention via Token Permutation PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA CARE: Class-Adaptive Expert Consensus for Reliable Learning with Long-Tailed Noisy Labels B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation Rethinking Transfer Learning for Industrial Inspection: DINOv3 vs. ImageNet Pretraining Across RGB and X-ray Tasks ETCHR: Editing To Clarify and Harness Reasoning Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs Lipschitz Optimization for Formal Verification of Homographies General Hazard Detection DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models PhenoYieldNet: Learning Crop-Aware Phenological Responses for Multi-Crop Yield Prediction Recursive Block-Diagonal Coupling for Resource-Efficient Training of Vision Models STAMBRIDGE: Spectral-Temporal Amplitude-aware Mid-Feature Bridge for EEG Visual Decoding Joint Target-Less Intrinsic and Extrinsic Camera-LiDAR Calibration using Deep Point Correspondences CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering MapGCLR: Geospatial Contrastive Learning of Representations for Online Vectorized HD Map Construction CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework GFSR: Geometric Fidelity and Spatial Refinement for Reliable Lane Detection Semantic-Aware Guided Drone Exploration for Language-Conditioned 3D Indoor Mapping Revitalizing Dense Material Segmentation: Stabilized Vision Transformers and the Generalization Paradox Exploring deep learning for Event-Based Saliency Prediction with a Transformer-based model Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision? EM-Vid: Training-Free Entity-Centric Memory for Efficient and Consistent Multi-Shot Video Generation Do Synthetic Brain MRIs Reliably Improve Tumour Classification? A StyleGAN2-ADA Class-Plane Augmentation Study on BRISC 2025 IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction SLIP-RS: Structured-Attribute Language-Image Pre-Training for Remote Sensing Object Detection Millimeter-wave Imaging for Anthropometric Body Measurement VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations Vision Transformers Need Better Token Interaction DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025 VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images Weierstrass Positional Encoding for Vision Transformers SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion LaMo: Self-Supervised Latent Motion Priors for Physical Realism in Video Generation Beyond Normal References: Discriminative Few-Shot Anomaly Detection Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention ComPose: When to Trust Hands for Object Pose Tracking PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation Extending Deep Event Visual Odometry with Sparse Point-Cloud Export GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation Enhancing Blood Cells Classification using Hybrid Quantum Neural Networks MDS-DETR: DETR with Masked Duplicate Suppressor CoMoGen: COntrollable MOtion Dynamics and Interactions with Mask-Guided Video GENeration Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering DepthAgent: Towards Better Universal Depth Estimation via Sample-wise Expert Selection Decoupling Spatio-Temporal Adapter for Fine-Grained Badminton Action Localization CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs Geo-Align: Video Generation Alignment via Metric Geometry Reward One-Forcing: Towards Stable One-Step Autoregressive Video Generation Online Hand Gesture Recognition Using 3D Convolutional Neural Networks From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset ExpOS: Explainable Open-Surgery Skills Assessment Using 3D Hand Reconstruction Multimodal Distribution Matching for Vision-Language Dataset Distillation RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos DualMem: Bypassing the Objectness Bottleneck for Calibrated Unknown-Stream Filtering in Open-World Object Detection Learning a Particle Dynamics Model with Real-world Videos Scene Reconstruction as Mapping Priors for 3D Detection Calibration-Informative Region Selection for Online LiDAR--Camera Calibration in Agricultural Environments Inconsistency-aware Multimodal Schrödinger Bridge for Deepfake Localization Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering VideoOdyssey: A Benchmark for Ultra-Long-Context and Omni-Modal Video Understanding PhotoFlow: Agentic 3D Virtual Photography Missions Composing People Together: Iterative Pose-Image Generation for Multi-Person Interaction Scenes LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images MuellerPT: Decomposition Driven Pretraining for Dense Learning in Mueller Polarimetry StereoGenBench: A Synthetic Multi-Camera Benchmark for Stereo Generation under Controlled Baseline Regimes ChainFlow-VLA: Causal Flow Planning with Vision-Language Models Machine learning applied to emerald gemstone grading: framework proposal and creation of a public dataset LQ-rPPG: A Label-Quantized Coarse-to-Fine Learning Framework for Remote Physiological Measurement RADAR: Relative Angular Divergence Across Representations HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences MedExpMem: Adapting Experience Memory for Differential Diagnosis FAST-ME: Foundation-aware Adaptive Stopping for Motion Estimation for Efficient IoT Video Analysis
VFM$^{4}$SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection
Yupeng Zhang · 2026-04-24 · via cs.CV updates on arXiv.org

View PDF HTML (experimental)

Abstract:Real-world weather, illumination, and imaging variations often induce severe domain shifts, degrading single-source detectors in unseen environments. Existing single-domain generalized object detection (SDGOD) methods mainly rely on data augmentation or domain-invariant learning, while largely overlooking how domain shift disrupts detector prediction stability. Through analytical experiments, we find that performance degradation is mainly dominated by increasing missed detections. Further analysis shows that this phenomenon stems from reduced cross-domain stability in DETR-style detectors: domain shift disrupts encoder-side object-background and inter-instance relations, and further weakens the semantic-spatial binding between decoder queries and real objects. Motivated by this, we find that vision foundation models (VFMs) still preserve stable relational structures and object responses under severe shifts, making them suitable cross-domain stability priors to compensate for detector degradation. To this end, we propose VFM$^{4}$SDG, a dual-prior learning framework for SDGOD, which introduces a frozen VFM into encoder representation learning and decoder query modeling. Specifically, we propose Cross-domain Stable Relational Prior Distillation to distill stable object-background and inter-instance relations from the VFM into the encoder, compensating for relational degradation. Meanwhile, we propose Semantic-Contextual Prior-based Query Enhancement, which injects category semantic prototypes and global object context into queries before they enter the decoder layer, enhancing semantic-spatial query-object binding stability. Extensive experiments show that VFM$^{4}$SDG significantly outperforms existing advanced methods on standard SDGOD benchmarks and two mainstream DETR-based detection frameworks, demonstrating its effectiveness, robustness, and generality.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2604.21502 [cs.CV]
  (or arXiv:2604.21502v2 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2604.21502

arXiv-issued DOI via DataCite

Submission history

From: Yupeng Zhang [view email]
[v1] Thu, 23 Apr 2026 10:04:36 UTC (761 KB)
[v2] Fri, 22 May 2026 15:42:19 UTC (1,535 KB)