인셔셔RSS 관심 있는 블로그, 뉴스, 기술 정보를 효율적으로 추적하고 읽으세요
원문 읽기 InertiaRSS에서 열기

추천 피드

cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
P
Proofpoint News Feed
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Google DeepMind News
Google DeepMind News
T
The Blog of Author Tim Ferriss
T
Tor Project blog
T
Threatpost
V
Vulnerabilities – Threatpost
大猫的无限游戏
大猫的无限游戏
量子位
Scott Helme
Scott Helme
Schneier on Security
Schneier on Security
有赞技术团队
有赞技术团队
Recent Commits to openclaw:main
Recent Commits to openclaw:main
李成银的技术随笔
K
Kaspersky official blog
T
ThreatConnect
美团技术团队
博客园 - Franky
爱范儿
爱范儿
A
Arctic Wolf
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
SecWiki News
SecWiki News
Microsoft Azure Blog
Microsoft Azure Blog
博客园 - 叶小钗
Recorded Future
Recorded Future
L
Lohrmann on Cybersecurity
J
Java Code Geeks
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
D
DataBreaches.Net
Spread Privacy
Spread Privacy
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Comments on: Blog
B
Blog RSS Feed
L
LINUX DO - 热门话题
阮一峰的网络日志
阮一峰的网络日志
腾讯CDC
酷 壳 – CoolShell
酷 壳 – CoolShell
N
Netflix TechBlog - Medium
S
SegmentFault 最新的问题
S
Security @ Cisco Blogs
Latest news
Latest news
I
InfoQ
Project Zero
Project Zero
P
Privacy International News Feed
D
Docker
The Hacker News
The Hacker News
A
About on SuperTechFans

cs.CV updates on arXiv.org

OmniGF: A Dual-Branch Vision-Language Framework for Unified Gaze Following TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma Joint Instance Segmentation and Geometric Attribute Regression for Roof Structures in Aerial Imagery Not All Modalities Are Equal: Instruction-Aware Gating for Multimodal Videos Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning Sparse-LiDAR Prompting of Monocular Geometry Foundations: An Empirical Study Toward Long-Range Driving Depth Personalized Generative Models for Contextual Debiasing OmniInteract: Benchmarking Real-World Streaming Interaction for Real-Time Omnimodal Assistants VesselSim: learning 3D blood vessel segmentation without expert annotations Scheduled Style Injection: Expanding the Style-Content Pareto Frontier in Training-Free Diffusion-based Style Transfer LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Sleep-stage efficient classification using a lightweight self-supervised model A multifractal-based masked auto-encoder: an application to medical images Zero-Shot Object Re-Identification in Egocentric Kitchen Videos via Multi-Stage SAM3 Feature Fusion E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control Detail Consistent Stage-Wise Distillation for Efficient 3D MRI Segmentation RoMo: A Large-Scale, Richly Organized Dataset and Semantic Taxonomy for Human Motion Generation Re-M3Dr: Rebalanced MultiModal Mean Deviation Regression $R^3$: 3D Reconstruction via Relative Regression InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward AnchorDiff: Training-Free Concept Grounding for MM-DiTs via Anchor-Based Graph Propagation Clinically-Grounded Counterfactual Reasoning for Medical Video Diagnosis Multi-Modal Building Inspection via Perceiver IO Fusion of Satellite and Street-Level Imagery LongCat-Video-Avatar 1.5 Technical Report DuoGesture: Neuro-Inspired and Biomechanically Informed Dual-Stream Co-Speech Gesture Generation Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective Underwater360: Reconstructing Underwater Scenes from Panoramic Images with Omnidirectional Gaussian Splatting HydraPrompt: An Adaptive and Asymmetric Framework of Vision-Language Models for Synthetic Image Detection VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection The Rescue Effect: Spatio-Semantic Early Exit Bypasses Quantization Collapse in CLIP Uncertainty-Aware Gaussian Map for Vision-Language Navigation Dimensional Distribution Emotion State: Leveraging Valence and Arousal as a Common Embedding Space for Visual Emotion Analysis Unveiling the Fragility of Vision-Language Models: Multi-Modal Adversarial Synergy via Texture-Constrained Perturbations and Cross-Modal Optimization Unified Panoramic Geometry Estimation via Multi-View Foundation Models Frequency-Guided Fusion For RGB-Thermal Semantic Segmentation 3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening Cross-scale Aligned Supervision for Training GANs RadarSim: Simulating Single-Chip Radar via Multimodal Neural Fields Sentinel: Embodied Cooperative Spatial Reasoning and Planning Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models Comparative Study of Vision-Based Metric Measurement for Large-Scale Planar Scenes Triadic Dynamics Aware Diffusion Posterior Sampling for Inverse Problems: Optimizing Guidance and Stochasticity Schedules
BrainDINO: 일반화 가능한 임상 표현 학습을 위한 뇌 MRI 기반 모델
Yizhou Wu, S · 2026-05-01 · via cs.CV updates on arXiv.org

PDF 보기 HTML (실험적)

요약: 뇌 MRI는 다양한 신경과학 및 임상 응용 분야를 뒷받침하지만, 대부분의 학습 기반 방법은 작업 특이적이며 상당한 레이블된 데이터가 필요합니다. 여기서는 단일 자기 주도 표현이 다양한 뇌 MRI 종단점에 일반화할 수 있음을 보여줍니다. 저희는 인구, 질병, 측정 환경에 대한 광범위한 변화를 포함하는 20개 데이터셋에서의 약 660만 개의 레이블되지 않은 축면을 사용하여 BrainDINO, 자기 주도로 추출된 기초 모델을 훈련했습니다. 동결된 인코더와 가벼운 작업 헤드를 사용하여 BrainDINO는 종양 분할, 신경퇴행 및 신경발달 상태 분류, 뇌 연령 추정, 중후혈 관련 시간 예측, 분자 상태 예측, MRI 시퀀스 분류 및 생존 모델링을 포함한 전이를 지원했습니다. 여러 작업과 지도 규범을 통해 BrainDINO는 자연 이미지와 MRI 특정 자기 주도 기준선보다 일관되게 동일하거나 우수했으며, 레이블 부족 상황에서 특히 강력한 우위를 가졌습니다. 표현 분석은 특정 작업 지도 없이 해부학적으로 조직화되고 질병에 민감한 특징 구조를 보여주었습니다. 우리의 발견은 대규모 슬라이스별 자기 주도 학습이 복합체의 뇌 MRI 표현을 생성할 수 있으며, 볼륨 사전 훈련이나 전체 네트워크 미세 튜닝 없이 다양한 신경영상 작업을 지원할 수 있음을 나타냅니다. 이는 강력하고 데이터 효율적인 뇌 영상 분석을 위한 확장 가능한 기초를 마련합니다. 코드는 다음에 제공됩니다.이 https URL
댓글: 22 페이지, 5 그림
주제: 기계 학습 (cs.LG); 인공 지능 (cs.AI); 컴퓨터 비전 및 패턴 인식 (cs.CV)
참조: arXiv:2604.27277 [cs.LG]
  (또는 arXiv:2604.27277v2 [cs.LG] 이 버전용)
  https://doi.org/10.48550/arXiv.2604.27277

arXiv에서 발행한 DOI를 DataCite를 통해 제공

제출 이력

From: Yizhou Wu [이메일 보기]
[v1] 2026년 4월 30일 금 00:21:36 UTC (4,815 KB)
[v2] 화, 26 5월 2026 01:34:33 UTC (4,815 KB)