惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

cs.CV updates on arXiv.org

MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space Multimodal LLMs under Pairwise Modalities Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards FusionCell: Cross-Attentive Fusion of Layout Geometry and Netlist Topology for Standard-Cell Performance Prediction A Comprehensive Comparison of Deep Learning Architectures for COVID-19 Classification on CT & X-ray Imagery Goodbye Drift: Anchored Tree Sampling for Long-Horizon Video-to-Video Generation HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation SpineContextResUNet: A Computationally Efficient Residual UNet for Spine CT Segmentation Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models How You Move Tells What You'll Do: Trajectory-Conditioned Egocentric Prediction Capability $\neq$ Interpretability: Human Interpretability of Vision Foundation Models Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning Findings of the Counter Turing Test: AI-Generated Image Detection FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training Holistic Reliability Propagation: Decoupling Annotation and Prediction for Robust Noisy-Label Do Vision--Language Models Understand 3D Scenes or Just Catalogue Objects? Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools Mechanistic Interpretability for Learning Assurance of a Vision-Based Landing System AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models EPC-3D-Diff: Equivariant Physics Consistent Conditional 3D Latent Diffusion for CBCT to CT Synthesis Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics Deep Attention Reweighting: Post-Hoc Attention-Based Feature Aggregation in CNNs for Disentangling Core and Spurious Features under Spurious Correlations AnimeAdapter: Fine-grained and Consistent Zero-shot Anime Character Generation SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts VISTAQA: Benchmarking Joint Visual Question Answering and Pixel-Level Evidence Tiny-Engram: Trigger-Indexed Concept Tables for Generative Vision Early High-Frequency Injection for Geometry-Sensitive OOD Detection HADS-Net:A Hybrid Attention-Augmented Dual-Stream Network with Physics-Informed Augmentation for Breast Ultrasound Image Classification Mind Your Margin and Boundary: Are Your Distilled Datasets Truly Robust? ELEMENT: Multi-Modal Retinal Vessel Segmentation Based on a Coupled Region Growing and Machine Learning Approach ShadeBench: A Benchmark Dataset for Building Shade Simulation in Sustainable Society LER-YOLO: Reliability-Aware Expert Routing for Misaligned RGB-Infrared UAV Detection JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA MedCRP-CL: Continual Medical Image Segmentation via Bayesian Nonparametric Semantic Modality Discovery WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums Hybrid Machine Learning Model for Forest Height Estimation from TanDEM-X and Landsat Data Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding SDM: A Powerful Tool for Evaluating Model Robustness Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models You Don't Need Attention: Gated Convolutional Modeling for Watch-Based Fall Detection Understanding Model Behavior in Monocular Polyp Sizing Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection Uncertainty-Guided Conservative Propagation for Structured Inference in Vessel Segmentation STELLAR: Scaling 3D Perception Large Models for Autonomous Driving VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration Latent Space Guided Scenario Sampling for Multimodal Segmentation Under Missing Modalities GAMR: Geometric-Aware Manifold Regularization with Virtual Outlier Synthesis for Learning with Noisy Labels Lighting-aware Unified Model for Instance Segmentation TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design Diffuse to Detect: Bi-Level Sample Rebalancing with Pseudo-Label Diffusion for Point-Supervised Infrared Small-Target Detection Leveraging Vision-Language Models to Detect Attention in Educational Videos ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning MeshTailor: Cutting Seams via Generative Mesh Traversal Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis Why Latent Actions Fail, and How to Prevent It Generation of Heterogeneous PET Images from Uniform Organ Activity Maps Using a Pretrained Domain-Adapted Diffusion Model STAR-IOD: Scale-decoupled Topology Alignment with Pseudo-label Refinement for Remote Sensing Incremental Object Detection Direct Translation between Sign Languages AI-Assisted Competency Assessment from Egocentric Video in Simulation-Based Nursing Education Sketch2MinSurf: Vision-Language Guided Generation of Editable Minimal Surfaces from Hand-Drawn Sketches Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task FullFlow: Upgrading Text-to-Image Flow Matching Models for Bidirectional Vision--Language Generation What Semantics Survive the Connector? Diagnosing VLM-to-DiT Alignment in Video Editing ConceptSeg-R1: Segment Any Concept via Meta-Reinforcement Learning End-to-End Unmixing with Material Prompts for Hyperspectral Object Tracking $Δ$ynamics: Language-Based Representation for Inferring Rigid-Body Dynamics From Videos HAPS: Rethinking Image Similarity for Virtual Staining Pixel Wised Lesion Prediction on COVID-19 CT Imagery: A Comparative Analysis of Automated Image Segmentation Architectures Seeing Through Fog: Towards Fog-Invariant Action Recognition A Human-in-the-Loop Framework for Efficient Prompt Selection in Microscopy Vision-Language Models DarkShake-DVS: Event-based Human Action Recognition under Low-light andShaking Camera Conditions ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison Neural Collapse by Design: Learning Class Prototypes on the Hypersphere Continual Segmentation under Joint Nonstationarity Winfree Oscillatory Neural Network Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls Faster or Stronger: Towards Flexible Visual Place Recognition via Weighted Aggregation and Token Pruning Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving QwenSafe: Multimodal Content Rating Description Identification via Preference-Aligned VLMs Oracle Supervision Transfers for Hyperparameter Prediction in Model-Based Image Denoising NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding Gaze into the Details: Locality-Sensitive Enhancement for OCTA Retinal Vessel Segmentation Lowering the Barrier to IREX Participation: Open-Source Algorithms, Toolkit, and Benchmarking for Iris Recognition OlmoEarth v1.1: A more efficient family of OlmoEarth models GSA-YOLO: A High-Efficiency Framework via Structured Sparsity and Adaptive Knowledge Distillation for Real-Time X-ray Security Inspection Rethinking Cross-Layer Information Routing in Diffusion Transformers Variance Reduction for Expectations with Diffusion Teachers Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors RoPeSLR: 3D RoPE-driven Sparse-LowRank Attention for Efficient Diffusion Transformers Co-Fusion4D: Spatio-temporal Collaborative Fusion for Robust 3D Object Detection
Building Extraction from Remote Sensing Imagery under Hazy and Low-light Conditions: Benchmark and Baseline
Feifei Sang, · 2026-04-17 · via cs.CV updates on arXiv.org

View PDF HTML (experimental)

Abstract:Building extraction from optical Remote Sensing (RS) imagery suffers from performance degradation under real-world hazy and low-light conditions. However, existing optical methods and benchmarks focus primarily on ideal clear-weather conditions. While SAR offers all-weather sensing, its side-looking geometry causes geometric distortions. To address these challenges, we introduce HaLoBuilding, the first optical benchmark specifically designed for building extraction under hazy and low-light conditions. By leveraging a same-scene multitemporal pairing strategy, we ensure pixel-level label alignment and high fidelity even under extreme degradation. Building upon this benchmark, we propose HaLoBuild-Net, a novel end-to-end framework for building extraction in adverse RS scenarios. At its core, we develop a Spatial-Frequency Focus Module (SFFM) to effectively mitigate meteorological interference on building features by coupling large receptive field attention with frequency-aware channel reweighting guided by stable low-frequency anchors. Additionally, a Global Multi-scale Guidance Module (GMGM) provides global semantic constraints to anchor building topologies, while a Mutual-Guided Fusion Module (MGFM) implements bidirectional semantic-spatial calibration to suppress shallow noise and sharpen weather-induced blurred boundaries. Extensive experiments demonstrate that HaLoBuild-Net significantly outperforms state-of-the-art methods and conventional cascaded restoration-segmentation paradigms on the HaLoBuilding dataset, while maintaining robust generalization on WHU, INRIA, and LoveDA datasets. The source code and datasets are publicly available at: this https URL.
Comments: 14 pages, 12 figures, 9 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2604.15088 [cs.CV]
  (or arXiv:2604.15088v1 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2604.15088

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Wei Lu [view email]
[v1] Thu, 16 Apr 2026 14:49:18 UTC (39,658 KB)