인셔셔RSS 관심 있는 블로그, 뉴스, 기술 정보를 효율적으로 추적하고 읽으세요
원문 읽기 InertiaRSS에서 열기

추천 피드

cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
P
Proofpoint News Feed
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Google DeepMind News
Google DeepMind News
T
The Blog of Author Tim Ferriss
T
Tor Project blog
T
Threatpost
V
Vulnerabilities – Threatpost
大猫的无限游戏
大猫的无限游戏
量子位
Scott Helme
Scott Helme
Schneier on Security
Schneier on Security
有赞技术团队
有赞技术团队
Recent Commits to openclaw:main
Recent Commits to openclaw:main
李成银的技术随笔
K
Kaspersky official blog
T
ThreatConnect
美团技术团队
博客园 - Franky
爱范儿
爱范儿
A
Arctic Wolf
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
SecWiki News
SecWiki News
Microsoft Azure Blog
Microsoft Azure Blog
博客园 - 叶小钗
Recorded Future
Recorded Future
L
Lohrmann on Cybersecurity
J
Java Code Geeks
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
D
DataBreaches.Net
Spread Privacy
Spread Privacy
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Comments on: Blog
B
Blog RSS Feed
L
LINUX DO - 热门话题
阮一峰的网络日志
阮一峰的网络日志
腾讯CDC
酷 壳 – CoolShell
酷 壳 – CoolShell
N
Netflix TechBlog - Medium
S
SegmentFault 最新的问题
S
Security @ Cisco Blogs
Latest news
Latest news
I
InfoQ
Project Zero
Project Zero
P
Privacy International News Feed
D
Docker
The Hacker News
The Hacker News
A
About on SuperTechFans

cs.CV updates on arXiv.org

OmniGF: A Dual-Branch Vision-Language Framework for Unified Gaze Following TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma Joint Instance Segmentation and Geometric Attribute Regression for Roof Structures in Aerial Imagery Not All Modalities Are Equal: Instruction-Aware Gating for Multimodal Videos Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning Sparse-LiDAR Prompting of Monocular Geometry Foundations: An Empirical Study Toward Long-Range Driving Depth Personalized Generative Models for Contextual Debiasing OmniInteract: Benchmarking Real-World Streaming Interaction for Real-Time Omnimodal Assistants VesselSim: learning 3D blood vessel segmentation without expert annotations Scheduled Style Injection: Expanding the Style-Content Pareto Frontier in Training-Free Diffusion-based Style Transfer LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Sleep-stage efficient classification using a lightweight self-supervised model A multifractal-based masked auto-encoder: an application to medical images Zero-Shot Object Re-Identification in Egocentric Kitchen Videos via Multi-Stage SAM3 Feature Fusion E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control Detail Consistent Stage-Wise Distillation for Efficient 3D MRI Segmentation RoMo: A Large-Scale, Richly Organized Dataset and Semantic Taxonomy for Human Motion Generation Re-M3Dr: Rebalanced MultiModal Mean Deviation Regression $R^3$: 3D Reconstruction via Relative Regression InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward AnchorDiff: Training-Free Concept Grounding for MM-DiTs via Anchor-Based Graph Propagation Clinically-Grounded Counterfactual Reasoning for Medical Video Diagnosis Multi-Modal Building Inspection via Perceiver IO Fusion of Satellite and Street-Level Imagery LongCat-Video-Avatar 1.5 Technical Report DuoGesture: Neuro-Inspired and Biomechanically Informed Dual-Stream Co-Speech Gesture Generation Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective Underwater360: Reconstructing Underwater Scenes from Panoramic Images with Omnidirectional Gaussian Splatting HydraPrompt: An Adaptive and Asymmetric Framework of Vision-Language Models for Synthetic Image Detection VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection The Rescue Effect: Spatio-Semantic Early Exit Bypasses Quantization Collapse in CLIP Uncertainty-Aware Gaussian Map for Vision-Language Navigation Dimensional Distribution Emotion State: Leveraging Valence and Arousal as a Common Embedding Space for Visual Emotion Analysis Unveiling the Fragility of Vision-Language Models: Multi-Modal Adversarial Synergy via Texture-Constrained Perturbations and Cross-Modal Optimization Unified Panoramic Geometry Estimation via Multi-View Foundation Models Frequency-Guided Fusion For RGB-Thermal Semantic Segmentation 3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening Cross-scale Aligned Supervision for Training GANs RadarSim: Simulating Single-Chip Radar via Multimodal Neural Fields Sentinel: Embodied Cooperative Spatial Reasoning and Planning Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models Comparative Study of Vision-Based Metric Measurement for Large-Scale Planar Scenes Triadic Dynamics Aware Diffusion Posterior Sampling for Inverse Problems: Optimizing Guidance and Stochasticity Schedules
MultiSense-Pneumo: 자원이 부족한 환경에서 폐렴 검사를 위한 다중 모달 학습 프레임워크
Dineth Jayak · 2026-05-05 · via cs.CV updates on arXiv.org

PDF 보기 HTML (실험 중)

요약:폐렴은 여전히 전 세계적인 발병률과 사망률의 주요 원인으로 남아 있으며, 영상, 실험실 검사, 전문가의 치료에 접근이 제한된 저자원 환경에서 특히 그렇습니다. 임상 평가는 증상, 호흡 패턴, 구어 설명, 폐 영상과 같은 다양한 증거에 의존하며, 이로 인해 전선 검사는 본질적으로 다모드입니다. 그러나 많은 기존의 계산적 접근법은 단일모드로 남아 있으며 주로 방사선 영상에 집중합니다. 본 연구에서는 폐렴 관련 검사와 적재 지원을 위한 다모드 연구 프로토타입인 MultiSense-Pneumo를 제시합니다. 이 시스템은 구조화된 증상 설명자, 콕헤드 오디오, 구어 언어 및 폐 방사선 영상을 통합합니다. 시스템은 결정론적 증상 적재, LightGBM 기반의 소리 분류, ResNet-18을 사용한 도메인 적대적 방사선 영상 분석, 트랜스포머 기반의 음성 인식, 해석 가능한 후이온 연산기를 결합합니다. 각 모드는 정규화된 우려 신호로 변환되어 통합된 검사 추정치로 집계됩니다. 이온 연산자는 수동으로 지정되며, 학습된 값이나 임상적으로 최적화된 값이 아닌, 추정 가능하고 해석 가능한 매개변수로 취급됩니다. MultiSense-Pneumo는 표준 노트북 클래스 하드웨어에서 오프라인 실행을 염두에 두고 구현되었지만, 배포 검증되거나 임상 검증된 진단 시스템으로 제시되지 않습니다. 실험 결과는 합성 도메인 이동 하에서 방사선 경로의 구성 요소 수준 성능이 강력함을 보여주었으며, 동시에 중요한 한계점을 강조하기도 했습니다. 특히 콕헤드 오디오의 이상 클래스 재현율 감소와 쌍대적인 끝-투-끝 다모드 환자 평가의 부재를 포함합니다. 따라서 MultiSense-Pneumo는 검사와 적재 연구를 위한 프레임워크 및 구성 요소 수준 프로토타입으로 의도되었습니다.
주제: 컴퓨터 비전 및 패턴 인식 (cs.CV); 인공지능 (cs.AI); 머신 러닝 (cs.LG)
참조: arXiv:2605.02207 [cs.CV]
  (또는 arXiv:2605.02207v2 [cs.CV] 이 버전용)
  https://doi.org/10.48550/arXiv.2605.02207

DataCite를 통한 arXiv 발행 DOI

제출 이력

발신자: Dineth Jayakody [이메일 보기]
[v1] 월, 2026년 5월 4일 04:14:35 UTC (950 KB)
[v2] 화, 2026년 5월 26일 05:28:55 UTC (952 KB)