惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.CV updates on arXiv.org

DinoComplete: 3D Shape Completion with Distilled Semantic Priors and State Space Models InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Sentinel: Embodied Cooperative Spatial Reasoning and Planning OmniInteract: Benchmarking Real-World Streaming Interaction for Real-Time Omnimodal Assistants CmIVTP: Cross-modal Interaction-based Vessel Trajectory Prediction for Maritime Intelligence METATR: A Multilingual, Evolving Benchmark for Automatic Text Recognition REVERSE: Reinforcing Evidence Verification and Search for Agentic Image geo-localization Zero-Shot Object Re-Identification in Egocentric Kitchen Videos via Multi-Stage SAM3 Feature Fusion Learning Reference-Guided Exposure Correction with Hybrid Illumination Characteristics TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting Attenuation-Resilient Alternating Optimization for Laparoscopic Liver Landmark Detection Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models SoftCap: Soft-Budget Control for Diffusion Transformer Acceleration Joint Instance Segmentation and Geometric Attribute Regression for Roof Structures in Aerial Imagery O-MARC: Omni Memory-Augmented Compression Distillation for Efficient Video Understanding Pop-Up Distractions Reveal Bag-of-Events Behavior in Video Large Language Models Unified Panoramic Geometry Estimation via Multi-View Foundation Models CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies RoadGIE: Towards A Global-Scale Aerial Benchmark for Generalizable Interactive Road Extraction Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening CIRCLED: A Multi-turn CIR Dataset with Consistent Dialogues across Domains DV-SFT: Direct Vision Supervision for Fine-Grained Visual Understanding CodecCap: High-Fidelity Codec-Inspired Residual Modeling for Dense Video Captioning LongCat-Video-Avatar 1.5 Technical Report Uncertainty-Aware Gaussian Map for Vision-Language Navigation Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling Dimensional Distribution Emotion State: Leveraging Valence and Arousal as a Common Embedding Space for Visual Emotion Analysis OSMa-Bench++: Toward Open-Ended Benchmarking of Semantic Mapping for Manipulation with Prompt-Generated Synthetic Scenes SCKAN: Structural Consensus-based KAN Prototype Learning for Semi-Supervised Pancreas Segmentation ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation COVD: Continual Open-Vocabulary Object Detection with Novel Concept Injection Leveraging Text-to-Image Diffusion Models for Unsupervised Visual Object Tracking Recursive Flow Matching Sleep-stage efficient classification using a lightweight self-supervised model Natural Human Motion Recovery by Aligning High-Order Temporal Dynamics from Monocular Videos On the Robustness of Machine Unlearning for Vision-Language Models OmniGF: A Dual-Branch Vision-Language Framework for Unified Gaze Following Detail Consistent Stage-Wise Distillation for Efficient 3D MRI Segmentation RadarSim: Simulating Single-Chip Radar via Multimodal Neural Fields Scheduled Style Injection: Expanding the Style-Content Pareto Frontier in Training-Free Diffusion-based Style Transfer VesselSim: learning 3D blood vessel segmentation without expert annotations Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models Sparse-LiDAR Prompting of Monocular Geometry Foundations: An Empirical Study Toward Long-Range Driving Depth E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control FTibSuite: A Comprehensive Resource Suite for Tibetan Vision-Language Modeling Cross-scale Aligned Supervision for Training GANs Unveiling the Fragility of Vision-Language Models: Multi-Modal Adversarial Synergy via Texture-Constrained Perturbations and Cross-Modal Optimization Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective AnchorDiff: Training-Free Concept Grounding for MM-DiTs via Anchor-Based Graph Propagation BEAT: Rhythm-Elastic Alignment for Agentic Music-guided Movie Trailer Generation 3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation Adaptation-Free Heterogeneous Collaborative Perception with Unseen Agent Configurations Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction RoMo: A Large-Scale, Richly Organized Dataset and Semantic Taxonomy for Human Motion Generation Comparative Study of Vision-Based Metric Measurement for Large-Scale Planar Scenes Triadic Dynamics Aware Diffusion Posterior Sampling for Inverse Problems: Optimizing Guidance and Stochasticity Schedules A multifractal-based masked auto-encoder: an application to medical images Multi-Modal Building Inspection via Perceiver IO Fusion of Satellite and Street-Level Imagery $R^3$: 3D Reconstruction via Relative Regression Gaussian-Voxel Duet: A Dual-Scaffolding Hybrid Representation for Fast and Accurate Monocular Surface Reconstruction MedVol-R1: Reward-Driven Evidence Grounding for Volumetric Reasoning Segmentation DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding DelowlightSplat: Feed-Forward Gaussian Splatting for Lowlight 3D Scene Reconstruction OmniRetriever: Any-to-Any Audio-Video-Text Retrieval via Fusion-as-Teacher Distillation IPIBench: Evaluating Interactive Proactive Intelligence of MLLMs under Continuous Streams Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models Memory-Distilled Selection for Noise-Robust Anomaly Detection I2PRef: Image-Driven Point Completion with Iterative Refinement Not All Modalities Are Equal: Instruction-Aware Gating for Multimodal Videos SIMPC: Learning Self-Induced Mirror-Point Consistency for Unsupervised Point Cloud Denoising PinPoint: Prompting with Informative Interior Points Frequency-Guided Fusion For RGB-Thermal Semantic Segmentation Cesarean Scar Defect Segmentation in Transvaginal Ultrasound Images: a Dataset and Benchmark Joint 2D-3D Segmentation and Association in Street-level Imaging Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning Self-Intersection-Aware 3D Human Motion Generation Using an Efficient Human Sphere Proxy Personalized Generative Models for Contextual Debiasing VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes Revealing the core dimensions underlying representations in brains, behavior and AI BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection Small Object Detection in Industrial Recycling: A New Dataset and YOLO Performance Evaluation Underwater360: Reconstructing Underwater Scenes from Panoramic Images with Omnidirectional Gaussian Splatting A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models DuoGesture: Neuro-Inspired and Biomechanically Informed Dual-Stream Co-Speech Gesture Generation Clinically-Grounded Counterfactual Reasoning for Medical Video Diagnosis Timestep-Aware SVDQuant-GPTQ for W4A4 Quantization of Wan2.2-I2V HydraPrompt: An Adaptive and Asymmetric Framework of Vision-Language Models for Synthetic Image Detection Re-M3Dr: Rebalanced MultiModal Mean Deviation Regression Semi-Supervised Gaze Estimation via Disentangled Subspace Contrastive Learning JLT: Clean-Latent Prediction in Latent Diffusion Transformers The Rescue Effect: Spatio-Semantic Early Exit Bypasses Quantization Collapse in CLIP Once-For-All: A Train-Once and Select-Anytime Framework for Multimodal Instruction Tuning ChartAct: A Benchmark for Dynamic Chart Understanding MSCGC-KAN: Multi-scale Causal Graph Convolution and Kolmogorov-Arnold Feature Mapping for EEG Emotion Recognition
NeR-SC: Adapting Neural Video Representation to Screen Content
Ruohan Shi, · 2026-05-27 · via cs.CV updates on arXiv.org

View PDF HTML (experimental)

Abstract:Implicit neural representations have emerged as a promising paradigm for video compression, with recent methods achieving competitive performance on natural video. However, screen content video -- common in remote desktop, online education, and cloud gaming -- exhibits distinct statistics: sharp edges, limited color palettes, and strong temporal redundancy. Existing neural representation methods, designed for natural scenes, lack mechanisms to exploit these properties, leaving substantial room for improvement. In this paper, we propose NeR-SC, a neural representation framework tailored for screen content video. Building on the SNeRV backbone, NeR-SC introduces three screen-content-specific modules: (i) a learnable color palette that models the discrete color structure of screen content by restricting the low-frequency sub-band to a learned color set; (ii) a multi-gate dense fusion module that replaces sequential feature fusion with dense, attention-gated cross-stage interaction; and (iii) an embedding-level frame skip strategy that bypasses redundant decoder invocations for static frames, with zero training overhead. Experiments on DSCVC and VCD show that NeR-SC achieves 40.32~dB and 41.73~dB average PSNR, outperforming representative neural video representation methods and, at low bitrates, surpassing H.264 and H.265. The skip strategy enables real-time decoding with no loss in quality.
Comments: Submitted to PRMVAI 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Cite as: arXiv:2605.27024 [cs.CV]
  (or arXiv:2605.27024v1 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2605.27024

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Haogang Feng [view email]
[v1] Tue, 26 May 2026 13:43:50 UTC (14,981 KB)