惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.CV updates on arXiv.org

Benchmarking Composed Image Retrieval for Applied Earth Observation CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning CAFD: Concept-Aware DNN Fault Detection using VLMs MinerU-Popo: Universal Post-Processing Model for Structured Document Parsing Hierarchical Local-Global Transformer for Temporal Sentence Grounding Φ-Noise: Training-Free Temporal Video Conditioning via Phase-Based Noise Manipulation Cross-Modal Action Recognition in Egocentric Video Using Mamba: Integrating RGB and Hand Skeleton Streams via CLS Token Fusion Strategies ArtSplat: Feed-Forward Articulated 3D Gaussian Splatting from Sparse Multi-State Uncalibrated Views CMAP: Cross-Modal Adaptive Prompting for Multi-Domain Task-Incremental Learning Forgettable Federated Linear Learning with Certified Data Unlearning Paris 2.0: A Decentralized Diffusion Model for Video Generation Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution SliceWorld: A Predictive and Controllable World-State Model for CT Report Generation NudgeVAD: Language-Nudged End-to-End Driving via FiLM Residuals STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models PQDT: Pseudo-Query Dual Transformer for Robust Point Cloud Restoration FDDet: Achieving Data-Efficient Food Defect Detection Under Real-World Scenarios Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving Phase-Aware Wavelet-Based-Scattering Encoder-Decoder for Dense Predictions CoDA: Color Distribution Probing for Efficient and Generalizable AI-Generated Image Detection Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization Investigating the Effect of Network Pruning on Performance and Interpretability A Multimodal 3D Foundation Model for Light Sheet Fluorescence Microscopy Enables Few-Shot Segmentation, Classification, and Deblurring SparseWorld: Enhancing End-to-End Autonomous Driving via World Models with Sparse Scene Representation World Models as Group Actions MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection ViViD-5K: Vineyard vision dataset for field-based berry detection and segmentation and grape cluster closure estimation Appearance-Invariant Detection of Suggestive Motion via Laban Movement Descriptors on SMPL Skeletons Lattice theory and algebraic models for deep convolutional learning based on mathematical morphology Image-Conditioned Instance Prompt Network for Referring Remote Sensing Image Segmentation SILSM: A Sustainable Interactive Level Set Method for Progressive Refinement Robust Fuzzy Multi-view Learning under View Conflict Muon in Vision Transformers: Optimizer-Recipe Interactions and Gradient Spectra Causal Physics Steering in Video World Models via Concept Activation Vectors EgoAdapt: A Multi-Scene Egocentric Adaptation Method for CVPR 2026 HD-EPIC VQA Challenge When Search Becomes Memory: Turning Robot Design Trials into Transferable Skills MAGIC: Multimodal Alignment & Grounding-aware Instruction Coreset for Vision-Language Models PDEInvBench: A Comprehensive Dataset and Design Space Exploration of Neural Networks for PDE Inverse Problems Physen-Noise2Noise: Physics-Guided Self-Supervised Defocus Deblurring with Bias Correction under Low-Light Conditions PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training EgoAction: Egocentric Action Composition with Reliability-Aware Temporal Fusion for the EPIC-KITCHENS Action Detection Challenge at CVPR 2026 Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning GIBLy: Improving 3D Semantic Segmentation through an Architecture-Agnostic Lightweight Geometric Inductive Bias Layer Nano World Models: A Minimalist Implementation of Future Video Prediction Med-R2: An Adversarial Benchmark for Evidence-Grounded Reasoning in Medical VLMs Single View Seafloor Recovery from Imaging Sonar via Differentiable Rendering ERNIE-Image Technical Report Cross-Domain Generalization Limits of Vision Foundation Models in Facial Deepfake Detection Learnable Shape Prototypes with Occlusion-Geometry-Guided Injection for Amodal Instance Segmentation Parameter-Efficient CT Reconstruction via Deep Graph Laplacian Regularization Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models Beyond Generative Priors: Minority Sampling with JEPA-Guided Diffusion Motion-Compensated Weight Compression Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions Generalized Evidential Deep Learning: From a Bayesian Perspective TimeSpot: Benchmarking Geo-Temporal Understanding in Vision-Language Models in Real-World Settings From Theory to Decision Rule: Calibrating the Noisy-Label Crossover for Vision-Language Model Weak Supervision Across Three Medical-Imaging Benchmarks HumanEgo: Zero-Shot Robot Learning from Minutes of Human Egocentric Videos Self-supervised Dynamic Heterogeneous Degradation Modeling for Unified Zero-Shot Image Restoration Uncertainty-DTW for Sequences and Visual Tokens A Principled Self-Referenced Early Stopping Approach for Deep Image Prior PEDESTRIANQA: A Benchmark for Vision-Language Models on Pedestrian Intention and Trajectory Prediction Generating 3D models from sketches of human faces using a combined approach of Convolutional Neural Networks, Procedural Modeling, and Contour Mapping Concept Unlearning via Cross-Attention Activation Projection for Diffusion Models Remote sensing data imputation using deep learning for multispectral imagery V3H: View Variation and View Heredity for Incomplete Multi-view Clustering Dual Prototype-Conditioned Diffusion Model for Scalable Multi-Class Unsupervised Anomaly Detection in Large Category Spaces Coarse-to-Fine Domain Incremental Learning with Attentive Distillation for Mining Footprint Segmentation in Multispectral Imagery Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval CRISP -- Clustering-Based Redundancy-Reduced Instance Sampling for Pathology Case Representation and Retrieval Dale meets Langevin: A Multiplicative Denoising Diffusion Model DUEL: Adversarial Self-Play for Multimodal Reasoning Plume Segmentation from MethaneSAT with Cross-Sensor Transfer Learning and Physics-Informed Postprocessing When Interpretability Becomes a Liability: Adversarial Attacks on CBM Concept Layers Universal Boosts, Specific Suppressors: Sparse Autoencoder Steering of Medical Vision-Language Models VectorArk: Learning Practical Image Vectorization with Rounded Polygon Representation Opportunistic Target Selection: Early Directional Commitment for Query-Efficient Black-Box Adversarial Attacks In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models EgoProx: Evaluating MLLMs on Egocentric 3D Proximity Reasoning Across a Cognitive Hierarchy AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models EMA: Effort Metric Attention for Anatomical Effort-Guided Human Motion Diffusion Artiverse: A Diverse and Physically Grounded Dataset for Articulated Objects MGVQ: Synergizing Multi-dimensional Sensitivity-Aware and Gradient-Hessian Fusion for Vector Quantization Gaussian Rank-Based Neighborhood Degree for Graph Neural Networks in Image Classification TempRet: Temporal Enhancement and Two-Stage Reranking for CVPR 2026 EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge Prism: Spectral-Aware Block-Sparse Attention Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks FoodMonitor: Benchmarking MLLMs for Explainable Compliance Analysis IQA-Spider: Unifying Multi-Granularity Image Quality Assessment with Reasoning, Grounding and Referring Diff-Instruct with Diffused Reward: Towards Principled One-step Generator RL OmniEgo-R$^2$: A Routed Reasoning Framework for the 1st Cross-Domain EgoCross Challenge at CVPR 2026 Towards Large Model Feature Coding Trust-Aware Joint Feature-Prediction Discrepancy for Robust Domain Adaptation
CREST: Curvature-Regulated Event-Centric Sampling for Efficient Long-Video Understanding
Mehrajul Aba · 2026-05-12 · via cs.CV updates on arXiv.org

View PDF HTML (experimental)

Abstract:Selecting informative frames from long videos is a combinatorial problem that existing methods address either through efficient heuristics without explicit modeling of query-conditioned temporal structure, or through multi stage retrieval pipelines with substantial preprocessing cost. We propose \textbf{CREST}, a training-free frame selection method grounded in the temporal geometry of query--frame relevance. CREST is based on the observation that relevance over time exhibits structured local variation: sharp curvature around salient events and flatter regions in redundant segments. By using local curvature to guide selection, CREST allocates a fixed frame budget more effectively across brief decisive events and slowly evolving evidence. Under a fixed backbone and frame budget, CREST achieves higher accuracy than AKS, a lightweight relevance--coverage baseline, on LongVideoBench and VideoMME, while retaining 93--95\% of the accuracy of MIRA, a stronger multi-stage retrieval pipeline, at only 3--4\% of its preprocessing cost.\footnote{Code and implementation details are included in the supplementary material and will be released publicly upon acceptance.} On TempRel, our diagnostic benchmark for temporal frame selection, CREST achieves a 6.88\% relative improvement over AKS. Pairwise LLM-as-a-judge evaluation further shows that CREST-selected frames yield more coherent frame-conditioned descriptions, with win rates of 60.58\% and 54.50\% on the two benchmarks. These results show that local temporal geometry provides a simple and efficient basis for long-video frame selection.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2605.09223 [cs.CV]
  (or arXiv:2605.09223v2 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2605.09223

arXiv-issued DOI via DataCite

Submission history

From: Abdul Mohaimen Al Radi [view email]
[v1] Sat, 9 May 2026 23:47:46 UTC (19,742 KB)
[v2] Sun, 24 May 2026 20:57:02 UTC (5,464 KB)