

















Abstract:Implicit neural representations have emerged as a promising paradigm for video compression, with recent methods achieving competitive performance on natural video. However, screen content video -- common in remote desktop, online education, and cloud gaming -- exhibits distinct statistics: sharp edges, limited color palettes, and strong temporal redundancy. Existing neural representation methods, designed for natural scenes, lack mechanisms to exploit these properties, leaving substantial room for improvement. In this paper, we propose NeR-SC, a neural representation framework tailored for screen content video. Building on the SNeRV backbone, NeR-SC introduces three screen-content-specific modules: (i) a learnable color palette that models the discrete color structure of screen content by restricting the low-frequency sub-band to a learned color set; (ii) a multi-gate dense fusion module that replaces sequential feature fusion with dense, attention-gated cross-stage interaction; and (iii) an embedding-level frame skip strategy that bypasses redundant decoder invocations for static frames, with zero training overhead. Experiments on DSCVC and VCD show that NeR-SC achieves 40.32~dB and 41.73~dB average PSNR, outperforming representative neural video representation methods and, at low bitrates, surpassing H.264 and H.265. The skip strategy enables real-time decoding with no loss in quality.
| Comments: | Submitted to PRMVAI 2026 |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM) |
| Cite as: | arXiv:2605.27024 [cs.CV] |
| (or arXiv:2605.27024v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.27024 arXiv-issued DOI via DataCite (pending registration) |
From: Haogang Feng [view email]
[v1]
Tue, 26 May 2026 13:43:50 UTC (14,981 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。