惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

P
Proofpoint News Feed
The Last Watchdog
The Last Watchdog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Know Your Adversary
Know Your Adversary
P
Privacy & Cybersecurity Law Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
T
Threatpost
www.infosecurity-magazine.com
www.infosecurity-magazine.com
W
WeLiveSecurity
Scott Helme
Scott Helme
Google DeepMind News
Google DeepMind News
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
G
GRAHAM CLULEY
M
MIT News - Artificial intelligence
博客园 - 【当耐特】
V
Visual Studio Blog
Apple Machine Learning Research
Apple Machine Learning Research
Attack and Defense Labs
Attack and Defense Labs
Google Online Security Blog
Google Online Security Blog
S
Security @ Cisco Blogs
博客园_首页
J
Java Code Geeks
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
H
Hacker News: Front Page
雷峰网
雷峰网
K
Kaspersky official blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
博客园 - 司徒正美
T
Tor Project blog
阮一峰的网络日志
阮一峰的网络日志
L
LangChain Blog
I
Intezer
C
CXSECURITY Database RSS Feed - CXSecurity.com
G
Google Developers Blog
Help Net Security
Help Net Security
博客园 - Franky
U
Unit 42
P
Proofpoint News Feed
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
量子位
L
LINUX DO - 热门话题
N
News and Events Feed by Topic
MyScale Blog
MyScale Blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
N
News and Events Feed by Topic
H
Help Net Security
Blog — PlanetScale
Blog — PlanetScale
T
Threat Research - Cisco Blogs
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
TaoSecurity Blog
TaoSecurity Blog

cs.CV updates on arXiv.org

TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation M3D-Net: Multi-Modal 3D Facial Feature Reconstruction Network for Deepfake Detection Deepfake Detection Generalization with Diffusion Noise Learning Adaptive Reasoning Paths for Efficient Visual Reasoning Revisiting Token Compression for Accelerating ViT-based Sparse Multi-View 3D Object Detectors DVFace: Spatio-Temporal Dual-Prior Diffusion for Video Face Restoration The Fourth Challenge on Image Super-Resolution ($\times$4) at NTIRE 2026: Benchmark Results and Method Overview Controllable Video Object Insertion via Multiview Priors Giving Faces Their Feelings Back: Explicit Emotion Control for Feedforward Single-Image 3D Head Avatars WILD-SAM: Phase-Aware Expert Adaptation of SAM for Landslide Detection in Wrapped InSAR Interferograms Design and Validation of a Low-Cost Smartphone Based Fluorescence Detection Platform Compared with Conventional Microplate Readers FreqTrack: Frequency Learning based Vision Transformer for RGB-Event Object Tracking Chain of Modality: From Static Fusion to Dynamic Orchestration in Omni-MLLMs H2VLR: Heterogeneous Hypergraph Vision-Language Reasoning for Few-Shot Anomaly Detection Co-distilled attention guided masked image modeling with noisy teacher for self-supervised learning on medical images Crowdsourcing of Real-world Image Annotation via Visual Properties Zero-Ablation Overstates Register Content Dependence in DINO Vision Transformers FoodSense: A Multisensory Food Dataset and Benchmark for Predicting Taste, Smell, Texture, and Sound from Images SatBLIP: Context Understanding and Feature Identification from Satellite Imagery with Vision-Language Learning Interpretable Human Activity Recognition for Subtle Robbery Detection in Surveillance Videos DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines Geometrically Consistent Multi-View Scene Generation from Freehand Sketches One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding Seedance 2.0: Advancing Video Generation for World Complexity ROSE: Retrieval-Oriented Segmentation Enhancement HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments Don't Let the Video Speak: Audio-Contrastive Preference Optimization for Audio-Visual Language Models HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding Training-Free Semantic Multi-Object Tracking with Vision-Language Models Towards Unconstrained Human-Object Interaction OneHOI: Unifying Human-Object Interaction Generation and Editing Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models Seek-and-Solve: Benchmarking MLLMs for Visual Clue-Driven Reasoning in Daily Scenarios POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective Towards Multi-Object-Tracking with Radar on a Fast Moving Vehicle: On the Potential of Processing Radar in the Frequency Domain Depth-Aware Image and Video Orientation Estimation Remote Sensing Image Super-Resolution for Imbalanced Textures: A Texture-Aware Diffusion Framework Reward Design for Physical Reasoning in Vision-Language Models HiProto: Hierarchical Prototype Learning for Interpretable Object Detection Under Low-quality Conditions MApLe: Multi-instance Alignment of Diagnostic Reports and Large Medical Images Creo: From One-Shot Image Generation to Progressive, Co-Creative Ideation Heuristic Style Transfer for Real-Time, Efficient Weather Attribute Detection SceneGlue: Scene-Aware Transformer for Feature Matching without Scene-Level Annotation A Multi-Stage Optimization Pipeline for Bethesda Cell Detection in Pap Smear Cytology ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding ASTER: Latent Pseudo-Anomaly Generation for Unsupervised Time-Series Anomaly Detection PartNerFace: Part-based Neural Radiance Fields for Animatable Facial Avatar Reconstruction Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias Context Sensitivity Improves Human-Machine Visual Alignment PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image DiffMagicFace: Identity Consistent Facial Editing of Real Videos A Resource-Efficient Hybrid CNN-LSTM network for image-based bean leaf disease classification Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation DRG-Font: Dynamic Reference-Guided Few-shot Font Generation via Contrastive Style-Content Disentanglement Artificial intelligence application in lymphoma diagnosis with Vision Transformer using weakly supervised training From Synchrony to Sequence: Exo-to-Ego Generation via Interpolation PBE-UNet: A light weight Progressive Boundary-Enhanced U-Net with Scale-Aware Aggregation for Ultrasound Image Segmentation Temporally Consistent Long-Term Memory for 3D Single Object Tracking Failure Identification in Imitation Learning Via Statistical and Semantic Filtering Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging ClipGStream: Clip-Stream Gaussian Splatting for Any Length and Any Motion Multi-View Dynamic Scene Reconstruction ReConText3D: Replay-based Continual Text-to-3D Generation Granularity-Aware Transfer for Tree Instance Segmentation in Synthetic and Real Forests SLQ: Bridging Modalities via Shared Latent Queries for Retrieval with Frozen MLLMs Med-CAM: Minimal Evidence for Explaining Medical Decision Making Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data From Pixels to Nucleotides: End-to-End Token-Based Video Compression for DNA Storage Automatic Charge State Tuning of 300 mm FDSOI Quantum Dots Using Neural Network Segmentation of Charge Stability Diagram VRAG-DFD: Verifiable Retrieval-Augmentation for MLLM-based Deepfake Detection ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation What Are We Really Measuring? Rethinking Dataset Bias in Web-Scale Natural Image Collections via Unsupervised Semantic Clustering VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation Dehaze-then-Splat: Generative Dehazing with Physics-Informed 3D Gaussian Splatting for Smoke-Free Novel View Synthesis Efficient Multi-View 3D Object Detection by Dynamic Token Selection and Fine-Tuning SocialMirror: Reconstructing 3D Human Interaction Behaviors from Monocular Videos with Semantic and Geometric Guidance Radar-Informed 3D Multi-Object Tracking under Adverse Conditions ZoomSpec: A Physics-Guided Coarse-to-Fine Framework for Wideband Spectrum Sensing UHR-BAT: Budget-Aware Token Compression Vision-Language model for Ultra-High-Resolution Remote Sensing CLIP Architecture for Abdominal CT Image-Text Alignment and Zero-Shot Learning: Investigating Batch Composition and Data Scaling AI Powered Image Analysis for Phishing Detection Reconstruction of a 3D wireframe from a single line drawing via generative depth estimation Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression RadarSplat-RIO: Indoor Radar-Inertial Odometry with Gaussian Splatting-Based Radar Bundle Adjustment FiRe: Fine-grained Multimodal Reasoning for Enhanced Image Generation Learning Class Difficulty in Imbalanced Histopathology Segmentation via Dynamic Focal Attention RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception MyoVision: A Mobile Research Tool and NEATBoost-Attention Ensemble Framework for Real Time Chicken Breast Myopathy Detection A Study of Failure Modes in Two-Stage Human-Object Interaction Detection
Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks
Roy Turgeman, Tom Tirer · 2025-12-25 · via cs.CV updates on arXiv.org

The data processing inequality is an information-theoretic principle stating that the information content of a signal cannot be increased by processing the observations. In particular, it suggests that there is no benefit in enhancing the signal or encoding it before addressing a classification problem. This assertion can be proven to be true for the case of the optimal Bayes classifier. However, in practice, it is common to perform "low-level" tasks before "high-level" downstream tasks despite the overwhelming capabilities of modern deep neural networks. In this paper, we aim to understand when and why low-level processing can be beneficial for classification. We present a comprehensive theoretical study of a binary classification setup, where we consider a classifier that is tightly connected to the optimal Bayes classifier and converges to it as the number of training samples increases. We prove that for any finite number of training samples, there exists a pre-classification processing that improves the classification accuracy. We also explore the effect of class separation, training set size, and class balance on the relative gain from this procedure. We support our theory with an empirical investigation of the theoretical setup. Finally, we conduct an empirical study where we investigate the effect of denoising and encoding on the performance of practical deep classifiers on benchmark datasets. Specifically, we vary the size and class distribution of the training set, and the noise level, and demonstrate trends that are consistent with our theoretical results.