惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

cs.CV updates on arXiv.org

MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space Multimodal LLMs under Pairwise Modalities Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards FusionCell: Cross-Attentive Fusion of Layout Geometry and Netlist Topology for Standard-Cell Performance Prediction A Comprehensive Comparison of Deep Learning Architectures for COVID-19 Classification on CT & X-ray Imagery Goodbye Drift: Anchored Tree Sampling for Long-Horizon Video-to-Video Generation HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation SpineContextResUNet: A Computationally Efficient Residual UNet for Spine CT Segmentation Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models How You Move Tells What You'll Do: Trajectory-Conditioned Egocentric Prediction Capability $\neq$ Interpretability: Human Interpretability of Vision Foundation Models Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning Findings of the Counter Turing Test: AI-Generated Image Detection FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training Holistic Reliability Propagation: Decoupling Annotation and Prediction for Robust Noisy-Label Do Vision--Language Models Understand 3D Scenes or Just Catalogue Objects? Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools Mechanistic Interpretability for Learning Assurance of a Vision-Based Landing System AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models EPC-3D-Diff: Equivariant Physics Consistent Conditional 3D Latent Diffusion for CBCT to CT Synthesis Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics Deep Attention Reweighting: Post-Hoc Attention-Based Feature Aggregation in CNNs for Disentangling Core and Spurious Features under Spurious Correlations AnimeAdapter: Fine-grained and Consistent Zero-shot Anime Character Generation SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts VISTAQA: Benchmarking Joint Visual Question Answering and Pixel-Level Evidence Tiny-Engram: Trigger-Indexed Concept Tables for Generative Vision Early High-Frequency Injection for Geometry-Sensitive OOD Detection HADS-Net:A Hybrid Attention-Augmented Dual-Stream Network with Physics-Informed Augmentation for Breast Ultrasound Image Classification Mind Your Margin and Boundary: Are Your Distilled Datasets Truly Robust? ELEMENT: Multi-Modal Retinal Vessel Segmentation Based on a Coupled Region Growing and Machine Learning Approach ShadeBench: A Benchmark Dataset for Building Shade Simulation in Sustainable Society LER-YOLO: Reliability-Aware Expert Routing for Misaligned RGB-Infrared UAV Detection JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA MedCRP-CL: Continual Medical Image Segmentation via Bayesian Nonparametric Semantic Modality Discovery WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums Hybrid Machine Learning Model for Forest Height Estimation from TanDEM-X and Landsat Data Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding SDM: A Powerful Tool for Evaluating Model Robustness Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models You Don't Need Attention: Gated Convolutional Modeling for Watch-Based Fall Detection Understanding Model Behavior in Monocular Polyp Sizing Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection Uncertainty-Guided Conservative Propagation for Structured Inference in Vessel Segmentation STELLAR: Scaling 3D Perception Large Models for Autonomous Driving VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration Latent Space Guided Scenario Sampling for Multimodal Segmentation Under Missing Modalities GAMR: Geometric-Aware Manifold Regularization with Virtual Outlier Synthesis for Learning with Noisy Labels Lighting-aware Unified Model for Instance Segmentation TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design Diffuse to Detect: Bi-Level Sample Rebalancing with Pseudo-Label Diffusion for Point-Supervised Infrared Small-Target Detection Leveraging Vision-Language Models to Detect Attention in Educational Videos ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning MeshTailor: Cutting Seams via Generative Mesh Traversal Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis Why Latent Actions Fail, and How to Prevent It Generation of Heterogeneous PET Images from Uniform Organ Activity Maps Using a Pretrained Domain-Adapted Diffusion Model STAR-IOD: Scale-decoupled Topology Alignment with Pseudo-label Refinement for Remote Sensing Incremental Object Detection Direct Translation between Sign Languages AI-Assisted Competency Assessment from Egocentric Video in Simulation-Based Nursing Education Sketch2MinSurf: Vision-Language Guided Generation of Editable Minimal Surfaces from Hand-Drawn Sketches Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task FullFlow: Upgrading Text-to-Image Flow Matching Models for Bidirectional Vision--Language Generation What Semantics Survive the Connector? Diagnosing VLM-to-DiT Alignment in Video Editing ConceptSeg-R1: Segment Any Concept via Meta-Reinforcement Learning End-to-End Unmixing with Material Prompts for Hyperspectral Object Tracking $Δ$ynamics: Language-Based Representation for Inferring Rigid-Body Dynamics From Videos HAPS: Rethinking Image Similarity for Virtual Staining Pixel Wised Lesion Prediction on COVID-19 CT Imagery: A Comparative Analysis of Automated Image Segmentation Architectures Seeing Through Fog: Towards Fog-Invariant Action Recognition A Human-in-the-Loop Framework for Efficient Prompt Selection in Microscopy Vision-Language Models DarkShake-DVS: Event-based Human Action Recognition under Low-light andShaking Camera Conditions ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison Neural Collapse by Design: Learning Class Prototypes on the Hypersphere Continual Segmentation under Joint Nonstationarity Winfree Oscillatory Neural Network Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls Faster or Stronger: Towards Flexible Visual Place Recognition via Weighted Aggregation and Token Pruning Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving QwenSafe: Multimodal Content Rating Description Identification via Preference-Aligned VLMs Oracle Supervision Transfers for Hyperparameter Prediction in Model-Based Image Denoising NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding Gaze into the Details: Locality-Sensitive Enhancement for OCTA Retinal Vessel Segmentation Lowering the Barrier to IREX Participation: Open-Source Algorithms, Toolkit, and Benchmarking for Iris Recognition OlmoEarth v1.1: A more efficient family of OlmoEarth models GSA-YOLO: A High-Efficiency Framework via Structured Sparsity and Adaptive Knowledge Distillation for Real-Time X-ray Security Inspection Rethinking Cross-Layer Information Routing in Diffusion Transformers Variance Reduction for Expectations with Diffusion Teachers Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors RoPeSLR: 3D RoPE-driven Sparse-LowRank Attention for Efficient Diffusion Transformers Co-Fusion4D: Spatio-temporal Collaborative Fusion for Robust 3D Object Detection
Improving Prostate Gland Segmentation Using Transformer based Architectures
Shatha Abuda · 2026-04-17 · via cs.CV updates on arXiv.org

View PDF

Abstract:Inter reader variability and cross site domain shift challenge the automatic segmentation of prostate anatomy using T2 weighted MRI images. This study investigates whether transformer models can retain precision amid such heterogeneity. We compare the performance of UNETR and SwinUNETR in prostate gland segmentation against our previous 3D UNet model [1], based on 546 MRI (T2weighted) volumes annotated by two independent experts. Three training strategies were analyzed: single cohort dataset, 5 fold cross validated mixed cohort, and gland size based dataset. Hyperparameters were tuned by Optuna. The test set, from an independent population of readers, served as the evaluation endpoint (Dice Similarity Coefficient). In single reader training, SwinUNETR achieved an average dice score of 0.816 for Reader#1 and 0.860 for Reader#2, while UNETR scored 0.8 and 0.833 for Readers #1 and #2, respectively, compared to the baseline UNets 0.825 for Reader #1 and 0.851 for Reader #2. SwinUNETR had an average dice score of 0.8583 for Reader#1 and 0.867 for Reader#2 in cross-validated mixed training. For the gland size-based dataset, SwinUNETR achieved an average dice score of 0.902 for Reader#1 subset and 0.894 for Reader#2, using the five-fold mixed training strategy (Reader#1, n=53; Reader#2, n=87) at larger gland size-based subsets, where UNETR performed poorly. Our findings demonstrate that global and shifted-window self-attention effectively reduces label noise and class imbalance sensitivity, resulting in improvements in the Dice score over CNNs by up to five points while maintaining computational efficiency. This contributes to the high robustness of SwinUNETR for clinical deployment.
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as: arXiv:2506.14844 [eess.IV]
  (or arXiv:2506.14844v2 [eess.IV] for this version)
  https://doi.org/10.48550/arXiv.2506.14844

arXiv-issued DOI via DataCite

Submission history

From: Shatha Abudalou [view email]
[v1] Mon, 16 Jun 2025 14:53:50 UTC (925 KB)
[v2] Thu, 16 Apr 2026 09:12:22 UTC (985 KB)