惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.CV updates on arXiv.org

DinoComplete: 3D Shape Completion with Distilled Semantic Priors and State Space Models InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Sentinel: Embodied Cooperative Spatial Reasoning and Planning OmniInteract: Benchmarking Real-World Streaming Interaction for Real-Time Omnimodal Assistants CmIVTP: Cross-modal Interaction-based Vessel Trajectory Prediction for Maritime Intelligence METATR: A Multilingual, Evolving Benchmark for Automatic Text Recognition REVERSE: Reinforcing Evidence Verification and Search for Agentic Image geo-localization Zero-Shot Object Re-Identification in Egocentric Kitchen Videos via Multi-Stage SAM3 Feature Fusion Learning Reference-Guided Exposure Correction with Hybrid Illumination Characteristics TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting Attenuation-Resilient Alternating Optimization for Laparoscopic Liver Landmark Detection Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models IPIBench: Evaluating Interactive Proactive Intelligence of MLLMs under Continuous Streams Joint Instance Segmentation and Geometric Attribute Regression for Roof Structures in Aerial Imagery O-MARC: Omni Memory-Augmented Compression Distillation for Efficient Video Understanding Semi-Supervised Gaze Estimation via Disentangled Subspace Contrastive Learning Unified Panoramic Geometry Estimation via Multi-View Foundation Models CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies RoadGIE: Towards A Global-Scale Aerial Benchmark for Generalizable Interactive Road Extraction Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening CIRCLED: A Multi-turn CIR Dataset with Consistent Dialogues across Domains DV-SFT: Direct Vision Supervision for Fine-Grained Visual Understanding CodecCap: High-Fidelity Codec-Inspired Residual Modeling for Dense Video Captioning LongCat-Video-Avatar 1.5 Technical Report Uncertainty-Aware Gaussian Map for Vision-Language Navigation Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling Dimensional Distribution Emotion State: Leveraging Valence and Arousal as a Common Embedding Space for Visual Emotion Analysis OSMa-Bench++: Toward Open-Ended Benchmarking of Semantic Mapping for Manipulation with Prompt-Generated Synthetic Scenes NeR-SC: Adapting Neural Video Representation to Screen Content ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation JLT: Clean-Latent Prediction in Latent Diffusion Transformers Leveraging Text-to-Image Diffusion Models for Unsupervised Visual Object Tracking Recursive Flow Matching Sleep-stage efficient classification using a lightweight self-supervised model Natural Human Motion Recovery by Aligning High-Order Temporal Dynamics from Monocular Videos On the Robustness of Machine Unlearning for Vision-Language Models OmniGF: A Dual-Branch Vision-Language Framework for Unified Gaze Following Detail Consistent Stage-Wise Distillation for Efficient 3D MRI Segmentation RadarSim: Simulating Single-Chip Radar via Multimodal Neural Fields Scheduled Style Injection: Expanding the Style-Content Pareto Frontier in Training-Free Diffusion-based Style Transfer VesselSim: learning 3D blood vessel segmentation without expert annotations Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models Sparse-LiDAR Prompting of Monocular Geometry Foundations: An Empirical Study Toward Long-Range Driving Depth E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control FTibSuite: A Comprehensive Resource Suite for Tibetan Vision-Language Modeling Cross-scale Aligned Supervision for Training GANs Unveiling the Fragility of Vision-Language Models: Multi-Modal Adversarial Synergy via Texture-Constrained Perturbations and Cross-Modal Optimization Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective AnchorDiff: Training-Free Concept Grounding for MM-DiTs via Anchor-Based Graph Propagation SCKAN: Structural Consensus-based KAN Prototype Learning for Semi-Supervised Pancreas Segmentation 3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation Adaptation-Free Heterogeneous Collaborative Perception with Unseen Agent Configurations Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction RoMo: A Large-Scale, Richly Organized Dataset and Semantic Taxonomy for Human Motion Generation Comparative Study of Vision-Based Metric Measurement for Large-Scale Planar Scenes Triadic Dynamics Aware Diffusion Posterior Sampling for Inverse Problems: Optimizing Guidance and Stochasticity Schedules A multifractal-based masked auto-encoder: an application to medical images Multi-Modal Building Inspection via Perceiver IO Fusion of Satellite and Street-Level Imagery $R^3$: 3D Reconstruction via Relative Regression Gaussian-Voxel Duet: A Dual-Scaffolding Hybrid Representation for Fast and Accurate Monocular Surface Reconstruction MedVol-R1: Reward-Driven Evidence Grounding for Volumetric Reasoning Segmentation DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding DelowlightSplat: Feed-Forward Gaussian Splatting for Lowlight 3D Scene Reconstruction OmniRetriever: Any-to-Any Audio-Video-Text Retrieval via Fusion-as-Teacher Distillation BEAT: Rhythm-Elastic Alignment for Agentic Music-guided Movie Trailer Generation Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models Memory-Distilled Selection for Noise-Robust Anomaly Detection I2PRef: Image-Driven Point Completion with Iterative Refinement Not All Modalities Are Equal: Instruction-Aware Gating for Multimodal Videos SIMPC: Learning Self-Induced Mirror-Point Consistency for Unsupervised Point Cloud Denoising PinPoint: Prompting with Informative Interior Points Frequency-Guided Fusion For RGB-Thermal Semantic Segmentation Cesarean Scar Defect Segmentation in Transvaginal Ultrasound Images: a Dataset and Benchmark Joint 2D-3D Segmentation and Association in Street-level Imaging Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning Self-Intersection-Aware 3D Human Motion Generation Using an Efficient Human Sphere Proxy Personalized Generative Models for Contextual Debiasing VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes Revealing the core dimensions underlying representations in brains, behavior and AI BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection Small Object Detection in Industrial Recycling: A New Dataset and YOLO Performance Evaluation Underwater360: Reconstructing Underwater Scenes from Panoramic Images with Omnidirectional Gaussian Splatting A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models DuoGesture: Neuro-Inspired and Biomechanically Informed Dual-Stream Co-Speech Gesture Generation Clinically-Grounded Counterfactual Reasoning for Medical Video Diagnosis Timestep-Aware SVDQuant-GPTQ for W4A4 Quantization of Wan2.2-I2V HydraPrompt: An Adaptive and Asymmetric Framework of Vision-Language Models for Synthetic Image Detection Re-M3Dr: Rebalanced MultiModal Mean Deviation Regression SoftCap: Soft-Budget Control for Diffusion Transformer Acceleration Pop-Up Distractions Reveal Bag-of-Events Behavior in Video Large Language Models The Rescue Effect: Spatio-Semantic Early Exit Bypasses Quantization Collapse in CLIP Once-For-All: A Train-Once and Select-Anytime Framework for Multimodal Instruction Tuning ChartAct: A Benchmark for Dynamic Chart Understanding MSCGC-KAN: Multi-scale Causal Graph Convolution and Kolmogorov-Arnold Feature Mapping for EEG Emotion Recognition
Efficient All-Pairs Correlation Volume Sampling for Optical Flow Estimation
Karlis Marti · 2026-05-27 · via cs.CV updates on arXiv.org

View PDF HTML (experimental)

Abstract:Recent optical flow estimation methods often employ local cost sampling from a dense all-pairs correlation volume. This results in quadratic computational and memory complexity in the number of pixels. Although an alternative memory-efficient implementation with on-demand cost computation exists, this is significantly slower in practice and therefore many prior methods process images at downsampled resolutions, missing fine-grained details. To address this, we propose an algorithm for both memory and compute-efficient implementation of the all-pairs correlation volume sampling, still matching the exact mathematical operator as defined by RAFT. Our approach outperforms on-demand sampling by up to 92% while maintaining equally low memory usage, and performs at least on par with the default implementation with up to 99% lower memory usage. As cost sampling makes up a significant portion of the overall runtime, this can translate to up to 63% savings for the total end-to-end model inference on high-resolution inputs. Our evaluation of existing methods includes an 8K ultra-high-resolution dataset and an inference-time extension of the SEA-RAFT method. With this, we achieve state-of-the-art results at high resolutions both in accuracy and runtime.
Comments: CVPR 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as: arXiv:2505.16942 [cs.CV]
  (or arXiv:2505.16942v2 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2505.16942

arXiv-issued DOI via DataCite

Submission history

From: Karlis Martins Briedis [view email]
[v1] Thu, 22 May 2025 17:30:38 UTC (3,245 KB)
[v2] Tue, 26 May 2026 13:43:34 UTC (3,542 KB)