惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Vercel News
Vercel News
C
Cybersecurity and Infrastructure Security Agency CISA
I
Intezer
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Microsoft Azure Blog
Microsoft Azure Blog
Google Online Security Blog
Google Online Security Blog
V
V2EX - 技术
L
LangChain Blog
C
Comments on: Blog
B
Blog RSS Feed
H
Hacker News: Front Page
F
Fortinet All Blogs
SecWiki News
SecWiki News
Webroot Blog
Webroot Blog
P
Proofpoint News Feed
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
W
WeLiveSecurity
大猫的无限游戏
大猫的无限游戏
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
博客园_首页
C
Check Point Blog
P
Privacy & Cybersecurity Law Blog
小众软件
小众软件
T
The Blog of Author Tim Ferriss
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Scott Helme
Scott Helme
博客园 - Franky
P
Privacy International News Feed
阮一峰的网络日志
阮一峰的网络日志
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
AWS News Blog
AWS News Blog
L
Lohrmann on Cybersecurity
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
酷 壳 – CoolShell
酷 壳 – CoolShell
C
Cyber Attacks, Cyber Crime and Cyber Security
V
V2EX
B
Blog
C
CERT Recently Published Vulnerability Notes
Hacker News: Ask HN
Hacker News: Ask HN
H
Hackread – Cybersecurity News, Data Breaches, AI and More
A
Arctic Wolf
AI
AI
The Register - Security
The Register - Security
人人都是产品经理
人人都是产品经理
TaoSecurity Blog
TaoSecurity Blog
Project Zero
Project Zero
S
Secure Thoughts
Spread Privacy
Spread Privacy
宝玉的分享
宝玉的分享

cs.CV updates on arXiv.org

A Lightweight Multi-Metric No-Reference Image Quality Assessment Framework for UAV Imaging Graph Propagated Projection Unlearning: A Unified Framework for Vision and Audio Discriminative Models PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction 3DRealHead: Few-Shot Detailed Head Avatar GeoLink: A 3D-Aware Framework Towards Better Generalization in Cross-View Geo-Localization Towards Patient-Specific Deformable Registration in Laparoscopic Surgery Multitasking Embedding for Embryo Blastocyst Grading Prediction (MEmEBG) Neural 3D Reconstruction of Planetary Surfaces from Descent-Phase Wide-Angle Imagery SemiFA: An Agentic Multi-Modal Framework for Autonomous Semiconductor Failure Analysis Report Generation A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models 4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview Rethinking Uncertainty in Segmentation: From Estimation to Decision Indexing Multimodal Language Models for Large-scale Image Retrieval DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery Explainable Fall Detection for Elderly Monitoring via Temporally Stable SHAP in Skeleton-Based Human Activity Recognition See&Say: Vision Language Guided Safe Zone Detection for Autonomous Package Delivery Drones PAT-VCM: Plug-and-Play Auxiliary Tokens for Video Coding for Machines Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision Bias at the End of the Score Deep Spatially-Regularized and Superpixel-Based Diffusion Learning for Unsupervised Hyperspectral Image Clustering The Spectrascapes Dataset: Street-view imagery beyond the visible captured using a mobile platform Why MLLMs Struggle to Determine Object Orientations Towards Successful Implementation of Automated Raveling Detection: Effects of Training Data Size, Illumination Difference, and Spatial Shift Right Regions, Wrong Labels: Semantic Label Flips in Segmentation under Correlation Shift SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion Diarization MSGS: Multispectral 3D Gaussian Splatting Multi-Agent Object Detection Framework Based on Raspberry Pi YOLO Detector and Slack-Ollama Natural Language Interface A 3D SAM-Based Progressive Prompting Framework for Multi-Task Segmentation of Radiotherapy-induced Normal Tissue Injuries in Limited-Data Settings UniBlendNet: Unified Global, Multi-Scale, and Region-Adaptive Modeling for Ambient Lighting Normalization A Multimodal Clinically Informed Coarse-to-Fine Framework for Longitudinal CT Registration in Proton Therapy Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks CausalDisenSeg: A Causality-Guided Disentanglement Framework with Counterfactual Reasoning for Robust Brain Tumor Segmentation Under Missing Modalities DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis Physically-Guided Optical Inversion Enable Non-Contact Side-Channel Attack on Isolated Screens VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning Event-Adaptive State Transition and Gated Fusion for RGB-Event Object Tracking MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis A Study of Failure Modes in Two-Stage Human-Object Interaction Detection FiRe: Fine-grained Multimodal Reasoning for Enhanced Image Generation ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding Reconstruction of a 3D wireframe from a single line drawing via generative depth estimation AI Powered Image Analysis for Phishing Detection Anthropogenic Regional Adaptation in Multimodal Vision-Language Model Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference Belief-Aware VLM Model for Human-like Reasoning Multi-Frequency Local Plasticity for Visual Representation Learning Genie 4D: Semantic-Prior-Guided 4D Dynamic Scene Reconstruction Degradation-Consistent Paired Training for Robust AI-Generated Image Detection Lightweight Low-Light Image Enhancement via Distribution-Normalizing Preprocessing and Depthwise U-Net VRAG: Learning World Models for Interactive Video Generation FedKLPR: KL-Guided Pruning-Aware Federated Learning for Person Re-Identification Geometry-Aware Cross Modal Alignment for Light Field-LiDAR Semantic Segmentation DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models Multinex: Lightweight Low-light Image Enhancement via Multi-prior Retinex COREY: Entropy-Guided Runtime Chunk Scheduling for Selective Scan Kernels Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior Zero-Shot Quantization via Weight-Space Arithmetic Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach A Semi-Automated Framework for 3D Reconstruction of Medieval Manuscript Miniatures ViSAGE @ NTIRE 2026 Challenge on Video Saliency Prediction On Semiotic-Grounded Interpretive Evaluation of Generative Art 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding InsEdit: Towards Instruction-based Visual Editing via Data-Efficient Video Diffusion Models Adaptation EfficientSign: An Attention-Enhanced Lightweight Architecture for Indian Sign Language Recognition Unified Multimodal Uncertain Inference State Space Models are Effective Sign Language Learners: Exploiting Phonological Compositionality for Vocabulary-Scale Recognition Towards Responsible Multimodal Medical Reasoning via Context-Aligned Vision-Language Models SenBen: Sensitive Scene Graphs for Explainable Content Moderation CatalogStitch: Dimension-Aware and Occlusion-Preserving Object Compositing for Catalog Image Generation DeFakeQ: Enabling Real-Time Deepfake Detection on Edge Devices via Adaptive Bidirectional Quantization BIAS: A Biologically Inspired Algorithm for Video Saliency Detection Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS) Degradation-Robust Fusion: An Efficient Degradation-Aware Diffusion Framework for Multimodal Image Fusion in Arbitrary Degradation Scenarios Dynamic Class-Aware Active Learning for Unbiased Satellite Image Segmentation Domain-generalizable Face Anti-Spoofing with Patch-based Multi-tasking and Artifact Pattern Conversion Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation Detecting Diffusion-generated Images via Dynamic Assembly Forests CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation Long-SCOPE: Fully Sparse Long-Range Cooperative 3D Perception Adding Another Dimension to Image-based Animal Detection Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search Through Their Eyes: Fixation-aligned Tuning for Personalized User Emulation FDIF: Formula-Driven supervised Learning with Implicit Functions for 3D Medical Image Segmentation B-MoE: A Body-Part-Aware Mixture-of-Experts "All Parts Matter" Approach to Micro-Action Recognition Neural Distribution Prior for LiDAR Out-of-Distribution Detection
Dual Cross-Attention Siamese Transformer for Rectal Tumor Regrowth Assessment in Watch-and-Wait Endoscopy
Jorge Tapias · 2026-05-08 · via cs.CV updates on arXiv.org

View PDF HTML (experimental)

Abstract:Increasing evidence supports watch-and-wait (WW) surveillance for patients with rectal cancer who show clinical complete response (cCR) at restaging following total neoadjuvant treatment (TNT). However, objectively accurate methods to early detect local regrowth (LR) from follow-up endoscopy images during WW are essential to manage care and prevent distant metastases. Hence, we developed a Siamese Swin Transformer with Dual Cross-Attention (SSDCA) to combine longitudinal endoscopic images at restaging and follow-up and distinguish cCR from LR. SSDCA leverages pretrained Swin transformers to extract domain agnostic features and enhance robustness to imaging variations. Dual cross attention is implemented to emphasize features from the two scans without requiring any spatial alignment of images to predict response. SSDCA as well as Swin-based baselines were trained using image pairs from 135 patients and evaluated on a held-out set of image pairs from 62 patients. SSDCA produced the best balanced accuracy (81.76\% $\pm$ 0.04), sensitivity (90.07\% $\pm$ 0.08), and specificity (72.86\% $\pm$ 0.05). Robustness analysis showed stable performance irrespective of artifacts including blood, stool, telangiectasia, and poor image quality. UMAP clustering of extracted features showed maximal inter-cluster separation (1.45 $\pm$ 0.18) and minimal intra-cluster dispersion (1.07 $\pm$ 0.19) with SSDCA, confirming discriminative representation learning.
Comments: Accepted to ISBI 2026 conference (6 pages, 5 figures, 1 table)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2512.03883 [cs.CV]
  (or arXiv:2512.03883v2 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2512.03883

arXiv-issued DOI via DataCite

Submission history

From: Jorge Tapias Gomez [view email]
[v1] Wed, 3 Dec 2025 15:34:29 UTC (1,756 KB)
[v2] Thu, 7 May 2026 16:52:47 UTC (1,756 KB)