惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.CV updates on arXiv.org

Machine learning applied to emerald gemstone grading: framework proposal and creation of a public dataset EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution The TIME Machine: On The Power of Motion for Efficient Perception PixIE: Prompted Pixel-Space Low-Light Image Enhancement Spatio-Temporal Similarity Volume Aggregation for Open-Vocabulary Action Recognition Sparser Block-Sparse Attention via Token Permutation PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA CARE: Class-Adaptive Expert Consensus for Reliable Learning with Long-Tailed Noisy Labels B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation Rethinking Transfer Learning for Industrial Inspection: DINOv3 vs. ImageNet Pretraining Across RGB and X-ray Tasks ETCHR: Editing To Clarify and Harness Reasoning Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction Lipschitz Optimization for Formal Verification of Homographies General Hazard Detection DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection Turning Adaptation into Assets: Cross-Domain Bridging for Online Vision-Language Navigation PhenoYieldNet: Learning Crop-Aware Phenological Responses for Multi-Crop Yield Prediction RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video Semantic-Aware Guided Drone Exploration for Language-Conditioned 3D Indoor Mapping Joint Target-Less Intrinsic and Extrinsic Camera-LiDAR Calibration using Deep Point Correspondences CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering MapGCLR: Geospatial Contrastive Learning of Representations for Online Vectorized HD Map Construction Weierstrass Positional Encoding for Vision Transformers Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers GFSR: Geometric Fidelity and Spatial Refinement for Reliable Lane Detection GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences PhotoFlow: Agentic 3D Virtual Photography Missions Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision? EM-Vid: Training-Free Entity-Centric Memory for Efficient and Consistent Multi-Shot Video Generation LaMo: Self-Supervised Latent Motion Priors for Physical Realism in Video Generation IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction SLIP-RS: Structured-Attribute Language-Image Pre-Training for Remote Sensing Object Detection Millimeter-wave Imaging for Anthropometric Body Measurement VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations Do Synthetic Brain MRIs Reliably Improve Tumour Classification? A StyleGAN2-ADA Class-Plane Augmentation Study on BRISC 2025 DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025 VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images Revitalizing Dense Material Segmentation: Stabilized Vision Transformers and the Generalization Paradox SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs Beyond Normal References: Discriminative Few-Shot Anomaly Detection Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention ComPose: When to Trust Hands for Object Pose Tracking Geo-Align: Video Generation Alignment via Metric Geometry Reward DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation STAMBRIDGE: Spectral-Temporal Amplitude-aware Mid-Feature Bridge for EEG Visual Decoding GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation Enhancing Blood Cells Classification using Hybrid Quantum Neural Networks MDS-DETR: DETR with Masked Duplicate Suppressor CoMoGen: COntrollable MOtion Dynamics and Interactions with Mask-Guided Video GENeration Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering DepthAgent: Towards Better Universal Depth Estimation via Sample-wise Expert Selection Decoupling Spatio-Temporal Adapter for Fine-Grained Badminton Action Localization CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs Extending Deep Event Visual Odometry with Sparse Point-Cloud Export One-Forcing: Towards Stable One-Step Autoregressive Video Generation Online Hand Gesture Recognition Using 3D Convolutional Neural Networks PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception Multimodal Distribution Matching for Vision-Language Dataset Distillation CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models MuellerPT: Decomposition Driven Pretraining for Dense Learning in Mueller Polarimetry Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos DualMem: Bypassing the Objectness Bottleneck for Calibrated Unknown-Stream Filtering in Open-World Object Detection Vision Transformers Need Better Token Interaction Scene Reconstruction as Mapping Priors for 3D Detection Calibration-Informative Region Selection for Online LiDAR--Camera Calibration in Agricultural Environments Inconsistency-aware Multimodal Schrödinger Bridge for Deepfake Localization Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models Recursive Block-Diagonal Coupling for Resource-Efficient Training of Vision Models RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering VideoOdyssey: A Benchmark for Ultra-Long-Context and Omni-Modal Video Understanding A Novel Approach for the Counting of Wood Logs Using cGANs and Image Processing Techniques Composing People Together: Iterative Pose-Image Generation for Multi-Person Interaction Scenes LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images Learning a Particle Dynamics Model with Real-world Videos StereoGenBench: A Synthetic Multi-Camera Benchmark for Stereo Generation under Controlled Baseline Regimes ChainFlow-VLA: Causal Flow Planning with Vision-Language Models Exploring deep learning for Event-Based Saliency Prediction with a Transformer-based model LQ-rPPG: A Label-Quantized Coarse-to-Fine Learning Framework for Remote Physiological Measurement RADAR: Relative Angular Divergence Across Representations Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models MedExpMem: Adapting Experience Memory for Differential Diagnosis FAST-ME: Foundation-aware Adaptive Stopping for Motion Estimation for Efficient IoT Video Analysis
ExpOS: Explainable Open-Surgery Skills Assessment Using 3D Hand Reconstruction
Roi Papo, Id · 2026-05-25 · via cs.CV updates on arXiv.org

View PDF HTML (experimental)

Abstract:Timely and transparent feedback is essential for effective surgical training, yet current assessment remains dependent on expert observation, limiting scalability and opportunities for autonomous practice. We present ExpOS, an explainable framework for data-driven assessment of open-surgery skills designed to enable automatic, feedback-oriented evaluation. Rather than relying on expert-defined metrics, ExpOS learns discriminative temporal patterns directly from motion data and identifies the segments and behaviors most predictive of skill level. We trained and evaluated the method on 221 videos of medical students performing three open-surgery tasks. Hand poses and tool detections were extracted from each frame to derive kinematic descriptors and global motion statistics. Spatiotemporal hand-tool dynamics were modeled using a temporal convolutional backbone with attention-based pooling to generate frame-level importance maps. These representations were fused with global motion statistics to predict skill level and to provide interpretable feedback. ExpOS provides multi-level explainability by identifying when informative events occur through attention weights and which motion characteristics most influence predictions through global feature analysis. Across tasks, the framework achieved strong correlation with expert ratings, with best performance on fascial closure (r = 0.778, R2 = 0.74). These results demonstrate that combining weakly-supervised temporal importance learning with interpretable motion statistics enables scalable and actionable surgical skill assessment.
Comments: 10 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2605.23653 [cs.CV]
  (or arXiv:2605.23653v1 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2605.23653

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Roi Papo [view email]
[v1] Fri, 22 May 2026 14:06:41 UTC (2,165 KB)