惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

C
Comments on: Blog
S
Schneier on Security
Microsoft Azure Blog
Microsoft Azure Blog
T
Tor Project blog
V
Visual Studio Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Spread Privacy
Spread Privacy
月光博客
月光博客
罗磊的独立博客
Cisco Talos Blog
Cisco Talos Blog
P
Privacy International News Feed
T
Tenable Blog
阮一峰的网络日志
阮一峰的网络日志
AWS News Blog
AWS News Blog
T
ThreatConnect
博客园 - 三生石上(FineUI控件)
Recorded Future
Recorded Future
Hugging Face - Blog
Hugging Face - Blog
T
Tailwind CSS Blog
博客园 - 叶小钗
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
A
Arctic Wolf
L
LINUX DO - 最新话题
美团技术团队
大猫的无限游戏
大猫的无限游戏
I
Intezer
博客园 - 司徒正美
酷 壳 – CoolShell
酷 壳 – CoolShell
量子位
小众软件
小众软件
T
Threatpost
V
V2EX
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
宝玉的分享
宝玉的分享
The Register - Security
The Register - Security
Project Zero
Project Zero
J
Java Code Geeks
Cyberwarzone
Cyberwarzone
IT之家
IT之家
MyScale Blog
MyScale Blog
T
Threat Research - Cisco Blogs
T
The Blog of Author Tim Ferriss
腾讯CDC
S
SegmentFault 最新的问题
F
Fox-IT International blog
S
Security Archives - TechRepublic
Last Week in AI
Last Week in AI
G
GRAHAM CLULEY
M
MIT News - Artificial intelligence

cs.CV updates on arXiv.org

A Novel Approach for the Counting of Wood Logs Using cGANs and Image Processing Techniques EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution The TIME Machine: On The Power of Motion for Efficient Perception PixIE: Prompted Pixel-Space Low-Light Image Enhancement Spatio-Temporal Similarity Volume Aggregation for Open-Vocabulary Action Recognition Sparser Block-Sparse Attention via Token Permutation PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA CARE: Class-Adaptive Expert Consensus for Reliable Learning with Long-Tailed Noisy Labels B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation Rethinking Transfer Learning for Industrial Inspection: DINOv3 vs. ImageNet Pretraining Across RGB and X-ray Tasks ETCHR: Editing To Clarify and Harness Reasoning Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs Lipschitz Optimization for Formal Verification of Homographies General Hazard Detection DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models PhenoYieldNet: Learning Crop-Aware Phenological Responses for Multi-Crop Yield Prediction Recursive Block-Diagonal Coupling for Resource-Efficient Training of Vision Models STAMBRIDGE: Spectral-Temporal Amplitude-aware Mid-Feature Bridge for EEG Visual Decoding Joint Target-Less Intrinsic and Extrinsic Camera-LiDAR Calibration using Deep Point Correspondences CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering MapGCLR: Geospatial Contrastive Learning of Representations for Online Vectorized HD Map Construction CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework GFSR: Geometric Fidelity and Spatial Refinement for Reliable Lane Detection Semantic-Aware Guided Drone Exploration for Language-Conditioned 3D Indoor Mapping Revitalizing Dense Material Segmentation: Stabilized Vision Transformers and the Generalization Paradox Exploring deep learning for Event-Based Saliency Prediction with a Transformer-based model Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision? EM-Vid: Training-Free Entity-Centric Memory for Efficient and Consistent Multi-Shot Video Generation Do Synthetic Brain MRIs Reliably Improve Tumour Classification? A StyleGAN2-ADA Class-Plane Augmentation Study on BRISC 2025 IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction SLIP-RS: Structured-Attribute Language-Image Pre-Training for Remote Sensing Object Detection Millimeter-wave Imaging for Anthropometric Body Measurement VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations Vision Transformers Need Better Token Interaction DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025 VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images Weierstrass Positional Encoding for Vision Transformers SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion LaMo: Self-Supervised Latent Motion Priors for Physical Realism in Video Generation Beyond Normal References: Discriminative Few-Shot Anomaly Detection Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention ComPose: When to Trust Hands for Object Pose Tracking PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation Extending Deep Event Visual Odometry with Sparse Point-Cloud Export GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation Enhancing Blood Cells Classification using Hybrid Quantum Neural Networks MDS-DETR: DETR with Masked Duplicate Suppressor CoMoGen: COntrollable MOtion Dynamics and Interactions with Mask-Guided Video GENeration Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering DepthAgent: Towards Better Universal Depth Estimation via Sample-wise Expert Selection Decoupling Spatio-Temporal Adapter for Fine-Grained Badminton Action Localization CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs Geo-Align: Video Generation Alignment via Metric Geometry Reward One-Forcing: Towards Stable One-Step Autoregressive Video Generation Online Hand Gesture Recognition Using 3D Convolutional Neural Networks From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset ExpOS: Explainable Open-Surgery Skills Assessment Using 3D Hand Reconstruction Multimodal Distribution Matching for Vision-Language Dataset Distillation RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos DualMem: Bypassing the Objectness Bottleneck for Calibrated Unknown-Stream Filtering in Open-World Object Detection Learning a Particle Dynamics Model with Real-world Videos Scene Reconstruction as Mapping Priors for 3D Detection Calibration-Informative Region Selection for Online LiDAR--Camera Calibration in Agricultural Environments Inconsistency-aware Multimodal Schrödinger Bridge for Deepfake Localization Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering VideoOdyssey: A Benchmark for Ultra-Long-Context and Omni-Modal Video Understanding PhotoFlow: Agentic 3D Virtual Photography Missions Composing People Together: Iterative Pose-Image Generation for Multi-Person Interaction Scenes LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images MuellerPT: Decomposition Driven Pretraining for Dense Learning in Mueller Polarimetry StereoGenBench: A Synthetic Multi-Camera Benchmark for Stereo Generation under Controlled Baseline Regimes ChainFlow-VLA: Causal Flow Planning with Vision-Language Models Machine learning applied to emerald gemstone grading: framework proposal and creation of a public dataset LQ-rPPG: A Label-Quantized Coarse-to-Fine Learning Framework for Remote Physiological Measurement RADAR: Relative Angular Divergence Across Representations HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences MedExpMem: Adapting Experience Memory for Differential Diagnosis FAST-ME: Foundation-aware Adaptive Stopping for Motion Estimation for Efficient IoT Video Analysis
What Linear Probes Miss: Multi-View Probing for Weight-Space Learning
Eunwoo Heo, · 2026-05-25 · via cs.CV updates on arXiv.org

View PDF HTML (experimental)

Abstract:The explosive growth of open-source model repositories has created a Model Jungle, where checkpoints are frequently shared without adequate documentation or metadata. While weight-space learning offers a pathway to identify and analyze these models directly from their parameters, processing full-scale weights is computationally prohibitive. Probing-based methods have emerged as a lightweight alternative, extracting permutation-equivariant representations via learnable probe vectors. However, existing probing methods are limited by a single-view design: they capture first-order structures but fail to encode the rich, higher-order correlation patterns inherent in row-column interactions. To bridge this gap, we introduce MVProbe, a multi-perspective probing framework that synthesizes first-order signals with interaction-aware (Gram-based) views. Our approach is theoretically grounded; we analyze the scaling laws of different probing orders to derive a principled standardization and fusion strategy that ensures balanced contributions from all branches. On the Model Jungle benchmark, MVProbe consistently outperforms the state-of-the-art ProbeX across diverse architectures, including discriminative backbones (ResNet, SupViT, MAE, DINO) and large-scale generative LoRA adapters (Stable Diffusion LoRA).
Comments: Accepted at ICML 2026. Code: this https URL ; Project page: this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2605.23410 [cs.LG]
  (or arXiv:2605.23410v1 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2605.23410

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Eunwoo Heo [view email]
[v1] Fri, 22 May 2026 09:18:01 UTC (2,276 KB)