惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Blog — PlanetScale
Blog — PlanetScale
C
Cybersecurity and Infrastructure Security Agency CISA
人人都是产品经理
人人都是产品经理
S
SegmentFault 最新的问题
Attack and Defense Labs
Attack and Defense Labs
C
CXSECURITY Database RSS Feed - CXSecurity.com
宝玉的分享
宝玉的分享
T
The Exploit Database - CXSecurity.com
N
News and Events Feed by Topic
博客园 - 三生石上(FineUI控件)
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
博客园_首页
Spread Privacy
Spread Privacy
博客园 - 【当耐特】
爱范儿
爱范儿
腾讯CDC
S
Security Archives - TechRepublic
大猫的无限游戏
大猫的无限游戏
T
Tenable Blog
罗磊的独立博客
N
News and Events Feed by Topic
C
Cisco Blogs
Google Online Security Blog
Google Online Security Blog
V
V2EX
TaoSecurity Blog
TaoSecurity Blog
T
Tailwind CSS Blog
Last Week in AI
Last Week in AI
Engineering at Meta
Engineering at Meta
S
Securelist
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
V
Vulnerabilities – Threatpost
V
V2EX - 技术
W
WeLiveSecurity
T
Troy Hunt's Blog
Apple Machine Learning Research
Apple Machine Learning Research
C
Cyber Attacks, Cyber Crime and Cyber Security
aimingoo的专栏
aimingoo的专栏
Cisco Talos Blog
Cisco Talos Blog
P
Palo Alto Networks Blog
博客园 - Franky
量子位
美团技术团队
T
Threat Research - Cisco Blogs
MongoDB | Blog
MongoDB | Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
D
Docker
T
Tor Project blog
Google DeepMind News
Google DeepMind News
Jina AI
Jina AI
Simon Willison's Weblog
Simon Willison's Weblog

cs.CV updates on arXiv.org

A Lightweight Multi-Metric No-Reference Image Quality Assessment Framework for UAV Imaging PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction 3DRealHead: Few-Shot Detailed Head Avatar GeoLink: A 3D-Aware Framework Towards Better Generalization in Cross-View Geo-Localization Towards Patient-Specific Deformable Registration in Laparoscopic Surgery Neural 3D Reconstruction of Planetary Surfaces from Descent-Phase Wide-Angle Imagery A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery See&Say: Vision Language Guided Safe Zone Detection for Autonomous Package Delivery Drones PAT-VCM: Plug-and-Play Auxiliary Tokens for Video Coding for Machines Bias at the End of the Score Deep Spatially-Regularized and Superpixel-Based Diffusion Learning for Unsupervised Hyperspectral Image Clustering The Spectrascapes Dataset: Street-view imagery beyond the visible captured using a mobile platform Why MLLMs Struggle to Determine Object Orientations Towards Successful Implementation of Automated Raveling Detection: Effects of Training Data Size, Illumination Difference, and Spatial Shift Right Regions, Wrong Labels: Semantic Label Flips in Segmentation under Correlation Shift SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion Diarization MSGS: Multispectral 3D Gaussian Splatting Multi-Agent Object Detection Framework Based on Raspberry Pi YOLO Detector and Slack-Ollama Natural Language Interface UniBlendNet: Unified Global, Multi-Scale, and Region-Adaptive Modeling for Ambient Lighting Normalization A Multimodal Clinically Informed Coarse-to-Fine Framework for Longitudinal CT Registration in Proton Therapy Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks CausalDisenSeg: A Causality-Guided Disentanglement Framework with Counterfactual Reasoning for Robust Brain Tumor Segmentation Under Missing Modalities Physically-Guided Optical Inversion Enable Non-Contact Side-Channel Attack on Isolated Screens VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning FiRe: Fine-grained Multimodal Reasoning for Enhanced Image Generation ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling A Study of Failure Modes in Two-Stage Human-Object Interaction Detection MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis Event-Adaptive State Transition and Gated Fusion for RGB-Event Object Tracking DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis A 3D SAM-Based Progressive Prompting Framework for Multi-Task Segmentation of Radiotherapy-induced Normal Tissue Injuries in Limited-Data Settings Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision Explainable Fall Detection for Elderly Monitoring via Temporally Stable SHAP in Skeleton-Based Human Activity Recognition Indexing Multimodal Language Models for Large-scale Image Retrieval Rethinking Uncertainty in Segmentation: From Estimation to Decision 4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview SemiFA: An Agentic Multi-Modal Framework for Autonomous Semiconductor Failure Analysis Report Generation Multitasking Embedding for Embryo Blastocyst Grading Prediction (MEmEBG) Graph Propagated Projection Unlearning: A Unified Framework for Vision and Audio Discriminative Models Beyond Attention Scores: SVD-Based Vision Token Pruning for Efficient Vision-Language Models Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference Anthropogenic Regional Adaptation in Multimodal Vision-Language Model ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation Lightweight Low-Light Image Enhancement via Distribution-Normalizing Preprocessing and Depthwise U-Net You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass A Semi-Automated Framework for 3D Reconstruction of Medieval Manuscript Miniatures ViSAGE @ NTIRE 2026 Challenge on Video Saliency Prediction InsEdit: Towards Instruction-based Visual Editing via Data-Efficient Video Diffusion Models Adaptation EfficientSign: An Attention-Enhanced Lightweight Architecture for Indian Sign Language Recognition Unified Multimodal Uncertain Inference State Space Models are Effective Sign Language Learners: Exploiting Phonological Compositionality for Vocabulary-Scale Recognition Towards Responsible Multimodal Medical Reasoning via Context-Aligned Vision-Language Models CatalogStitch: Dimension-Aware and Occlusion-Preserving Object Compositing for Catalog Image Generation DeFakeQ: Enabling Real-Time Deepfake Detection on Edge Devices via Adaptive Bidirectional Quantization BIAS: A Biologically Inspired Algorithm for Video Saliency Detection Degradation-Robust Fusion: An Efficient Degradation-Aware Diffusion Framework for Multimodal Image Fusion in Arbitrary Degradation Scenarios Dynamic Class-Aware Active Learning for Unbiased Satellite Image Segmentation Domain-generalizable Face Anti-Spoofing with Patch-based Multi-tasking and Artifact Pattern Conversion Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation Detecting Diffusion-generated Images via Dynamic Assembly Forests CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation Long-SCOPE: Fully Sparse Long-Range Cooperative 3D Perception Adding Another Dimension to Image-based Animal Detection Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search Through Their Eyes: Fixation-aligned Tuning for Personalized User Emulation FDIF: Formula-Driven supervised Learning with Implicit Functions for 3D Medical Image Segmentation B-MoE: A Body-Part-Aware Mixture-of-Experts "All Parts Matter" Approach to Micro-Action Recognition ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing COREY: Entropy-Guided Runtime Chunk Scheduling for Selective Scan Kernels Multinex: Lightweight Low-light Image Enhancement via Multi-prior Retinex Degradation-Consistent Paired Training for Robust AI-Generated Image Detection Genie 4D: Semantic-Prior-Guided 4D Dynamic Scene Reconstruction Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories Neural Distribution Prior for LiDAR Out-of-Distribution Detection Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS) SenBen: Sensitive Scene Graphs for Explainable Content Moderation Unsupervised Local Plasticity in a Multi-Frequency VisNet Hierarchy 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding On Semiotic-Grounded Interpretive Evaluation of Generative Art Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models Belief-Aware VLM Model for Human-like Reasoning Zero-Shot Quantization via Weight-Space Arithmetic Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning Interpretable Alzheimer's Diagnosis via Multimodal Fusion of Regional Brain Experts Geometry-Aware Cross Modal Alignment for Light Field-LiDAR Semantic Segmentation FedKLPR: KL-Guided Pruning-Aware Federated Learning for Person Re-Identification VRAG: Learning World Models for Interactive Video Generation Linear Attention Based Deep Nonlocal Means Filtering for Multiplicative Noise Removal MM-LIMA: Less Is More for Alignment in Multi-Modal Datasets SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions
DRIFTS: Optimizing Domain Randomization with Synthetic Data and Weight Interpolation for Fetal Brain Tissue Segmentation
Vladyslav Zalevskyi, Thomas Sanchez, Margaux Roulet, Hélène Lajo · 2024-11-11 · via cs.CV updates on arXiv.org

Fetal brain tissue segmentation in magnetic resonance imaging (MRI) is a crucial tool that supports understanding of neurodevelopment, yet it faces challenges due to the heterogeneity of data coming from different scanners and settings, as well as data scarcity. Recent approaches based on domain randomization, like SynthSeg, have shown great potential for single-source domain generalization by simulating images with randomized contrast and image resolution from the label maps. In this work, we investigate how to maximize the out-of-domain (OOD) generalization potential of SynthSegbased methods in fetal brain MRI. Specifically, we demonstrate that the simple Gaussian mixture models employed in FetalSynthSeg outperform physics-informed generation methods in terms of OOD generalization. We further show that incorporating intensity clustering significantly enhances generalization in settings with limited label classes by producing more realistic synthetic data. By combining synthetic pretraining with fine-tuning on real images and applying weight-space interpolation between the two models, we propose DRIFTS as an effective and practical solution for single-source domain generalization. DRIFTS consistently outperforms current state-of-the-art models across multiple benchmarks and is, to our knowledge, the first method to achieve accurate brain tissue segmentation on fetal T1-weighted images. We validate our approach on 308 subjects from four datasets acquired at three different sites, covering a range of scanner field strengths (0.55T to 3T) and both T1w and T2w modalities. We conclude with five practical recommendations to guide the development of SynthSeg-based methods for other organs and imaging modalities.