惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
P
Proofpoint News Feed
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Google DeepMind News
Google DeepMind News
T
The Blog of Author Tim Ferriss
T
Tor Project blog
T
Threatpost
V
Vulnerabilities – Threatpost
大猫的无限游戏
大猫的无限游戏
量子位
Scott Helme
Scott Helme
Schneier on Security
Schneier on Security
有赞技术团队
有赞技术团队
Recent Commits to openclaw:main
Recent Commits to openclaw:main
李成银的技术随笔
K
Kaspersky official blog
T
ThreatConnect
美团技术团队
博客园 - Franky
爱范儿
爱范儿
A
Arctic Wolf
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
SecWiki News
SecWiki News
Microsoft Azure Blog
Microsoft Azure Blog
博客园 - 叶小钗
Recorded Future
Recorded Future
L
Lohrmann on Cybersecurity
J
Java Code Geeks
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
D
DataBreaches.Net
Spread Privacy
Spread Privacy
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Comments on: Blog
B
Blog RSS Feed
L
LINUX DO - 热门话题
阮一峰的网络日志
阮一峰的网络日志
腾讯CDC
酷 壳 – CoolShell
酷 壳 – CoolShell
N
Netflix TechBlog - Medium
S
SegmentFault 最新的问题
S
Security @ Cisco Blogs
Latest news
Latest news
I
InfoQ
Project Zero
Project Zero
P
Privacy International News Feed
D
Docker
The Hacker News
The Hacker News
A
About on SuperTechFans

cs.CV updates on arXiv.org

OmniGF: A Dual-Branch Vision-Language Framework for Unified Gaze Following TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma Joint Instance Segmentation and Geometric Attribute Regression for Roof Structures in Aerial Imagery Not All Modalities Are Equal: Instruction-Aware Gating for Multimodal Videos Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning Sparse-LiDAR Prompting of Monocular Geometry Foundations: An Empirical Study Toward Long-Range Driving Depth Personalized Generative Models for Contextual Debiasing OmniInteract: Benchmarking Real-World Streaming Interaction for Real-Time Omnimodal Assistants VesselSim: learning 3D blood vessel segmentation without expert annotations Scheduled Style Injection: Expanding the Style-Content Pareto Frontier in Training-Free Diffusion-based Style Transfer LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Sleep-stage efficient classification using a lightweight self-supervised model A multifractal-based masked auto-encoder: an application to medical images Zero-Shot Object Re-Identification in Egocentric Kitchen Videos via Multi-Stage SAM3 Feature Fusion E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control Detail Consistent Stage-Wise Distillation for Efficient 3D MRI Segmentation RoMo: A Large-Scale, Richly Organized Dataset and Semantic Taxonomy for Human Motion Generation Re-M3Dr: Rebalanced MultiModal Mean Deviation Regression $R^3$: 3D Reconstruction via Relative Regression InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward AnchorDiff: Training-Free Concept Grounding for MM-DiTs via Anchor-Based Graph Propagation Clinically-Grounded Counterfactual Reasoning for Medical Video Diagnosis Multi-Modal Building Inspection via Perceiver IO Fusion of Satellite and Street-Level Imagery LongCat-Video-Avatar 1.5 Technical Report DuoGesture: Neuro-Inspired and Biomechanically Informed Dual-Stream Co-Speech Gesture Generation Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective Underwater360: Reconstructing Underwater Scenes from Panoramic Images with Omnidirectional Gaussian Splatting HydraPrompt: An Adaptive and Asymmetric Framework of Vision-Language Models for Synthetic Image Detection VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection The Rescue Effect: Spatio-Semantic Early Exit Bypasses Quantization Collapse in CLIP Uncertainty-Aware Gaussian Map for Vision-Language Navigation Dimensional Distribution Emotion State: Leveraging Valence and Arousal as a Common Embedding Space for Visual Emotion Analysis Unveiling the Fragility of Vision-Language Models: Multi-Modal Adversarial Synergy via Texture-Constrained Perturbations and Cross-Modal Optimization Unified Panoramic Geometry Estimation via Multi-View Foundation Models Frequency-Guided Fusion For RGB-Thermal Semantic Segmentation 3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening Cross-scale Aligned Supervision for Training GANs RadarSim: Simulating Single-Chip Radar via Multimodal Neural Fields Sentinel: Embodied Cooperative Spatial Reasoning and Planning Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models Comparative Study of Vision-Based Metric Measurement for Large-Scale Planar Scenes Triadic Dynamics Aware Diffusion Posterior Sampling for Inverse Problems: Optimizing Guidance and Stochasticity Schedules
BrainDINO:一个用于通用临床表征学习的脑部MRI基础模型
Yizhou Wu, S · 2026-05-01 · via cs.CV updates on arXiv.org

查看PDF HTML(实验性)

摘要:脑部磁共振成像为广泛的神经科学和临床应用提供了基础,然而大多数基于学习的方法仍然是任务特定的,并且需要大量的标记数据。在这里,我们展示了单个自监督表示可以泛化到异构的脑部磁共振成像终点。我们在来自20个数据集的约660万张未标记的轴向切片上训练了BrainDINO,这些数据集在人群、疾病和采集环境方面具有广泛的变异。使用冻结的编码器和轻量级任务头,BrainDINO支持跨肿瘤分割、神经退行性和神经发育障碍分类、脑年龄估计、中风后时间预测、分子状态预测、磁共振成像序列分类和生存建模的迁移。在任务和监督机制之间,BrainDINO始终等于或超过自然图像和特定于磁共振成像的自监督基线,在标签稀缺的情况下尤其具有优势。表示分析进一步显示了在没有特定任务监督的情况下,解剖学组织和病理敏感的特征结构。我们的研究表明,大规模切片自监督学习可以产生一个统一的脑部磁共振成像表示,该表示支持多种神经影像学任务,而无需体积预训练或全网络微调,为稳健和数据高效的脑部成像分析建立了一个可扩展的基础。代码可在这个 https URL
评论: 22 页,5 个图
主题: 机器学习 (cs.LG); 人工智能 (cs.AI); 计算机视觉与模式识别 (cs.CV)
引用方式: arXiv:2604.27277 [cs.LG]
  (或者 arXiv:2604.27277v2 [cs.LG] 对此版本)
  https://doi.org/10.48550/arXiv.2604.27277

通过DataCite发布的arXiv DOI

提交历史

来自:吴易舟 [查看邮箱]
[v1] 周四,2026年4月30日 00:21:36 UTC (4,815 KB)
[v2] 周二,2026年5月26日 01:34:33 UTC (4,815 KB)