惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
大猫的无限游戏
大猫的无限游戏
MongoDB | Blog
MongoDB | Blog
The Register - Security
The Register - Security
Jina AI
Jina AI
Y
Y Combinator Blog
WordPress大学
WordPress大学
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
有赞技术团队
有赞技术团队
B
Blog RSS Feed
Microsoft Security Blog
Microsoft Security Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
博客园 - 【当耐特】
Cloudbric
Cloudbric
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
C
CERT Recently Published Vulnerability Notes
L
LangChain Blog
A
Arctic Wolf
Apple Machine Learning Research
Apple Machine Learning Research
aimingoo的专栏
aimingoo的专栏
P
Palo Alto Networks Blog
G
GRAHAM CLULEY
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA
M
MIT News - Artificial intelligence
Last Week in AI
Last Week in AI
The Last Watchdog
The Last Watchdog
Google DeepMind News
Google DeepMind News
N
News and Events Feed by Topic
P
Privacy International News Feed
Vercel News
Vercel News
S
Securelist
I
InfoQ
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
B
Blog
N
News | PayPal Newsroom
Blog — PlanetScale
Blog — PlanetScale
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
A
About on SuperTechFans
Attack and Defense Labs
Attack and Defense Labs
小众软件
小众软件
C
Cisco Blogs
Simon Willison's Weblog
Simon Willison's Weblog
S
Secure Thoughts
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
T
Tailwind CSS Blog
T
The Blog of Author Tim Ferriss
H
Hackread – Cybersecurity News, Data Breaches, AI and More

cs.CV updates on arXiv.org

A Lightweight Multi-Metric No-Reference Image Quality Assessment Framework for UAV Imaging PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction 3DRealHead: Few-Shot Detailed Head Avatar GeoLink: A 3D-Aware Framework Towards Better Generalization in Cross-View Geo-Localization Towards Patient-Specific Deformable Registration in Laparoscopic Surgery Neural 3D Reconstruction of Planetary Surfaces from Descent-Phase Wide-Angle Imagery A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models Rethinking Uncertainty in Segmentation: From Estimation to Decision 4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview SemiFA: An Agentic Multi-Modal Framework for Autonomous Semiconductor Failure Analysis Report Generation Multitasking Embedding for Embryo Blastocyst Grading Prediction (MEmEBG) Graph Propagated Projection Unlearning: A Unified Framework for Vision and Audio Discriminative Models Beyond Attention Scores: SVD-Based Vision Token Pruning for Efficient Vision-Language Models Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference Anthropogenic Regional Adaptation in Multimodal Vision-Language Model ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation Lightweight Low-Light Image Enhancement via Distribution-Normalizing Preprocessing and Depthwise U-Net You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass A Semi-Automated Framework for 3D Reconstruction of Medieval Manuscript Miniatures ViSAGE @ NTIRE 2026 Challenge on Video Saliency Prediction InsEdit: Towards Instruction-based Visual Editing via Data-Efficient Video Diffusion Models Adaptation EfficientSign: An Attention-Enhanced Lightweight Architecture for Indian Sign Language Recognition Unified Multimodal Uncertain Inference State Space Models are Effective Sign Language Learners: Exploiting Phonological Compositionality for Vocabulary-Scale Recognition Towards Responsible Multimodal Medical Reasoning via Context-Aligned Vision-Language Models CatalogStitch: Dimension-Aware and Occlusion-Preserving Object Compositing for Catalog Image Generation DeFakeQ: Enabling Real-Time Deepfake Detection on Edge Devices via Adaptive Bidirectional Quantization BIAS: A Biologically Inspired Algorithm for Video Saliency Detection Degradation-Robust Fusion: An Efficient Degradation-Aware Diffusion Framework for Multimodal Image Fusion in Arbitrary Degradation Scenarios Dynamic Class-Aware Active Learning for Unbiased Satellite Image Segmentation Domain-generalizable Face Anti-Spoofing with Patch-based Multi-tasking and Artifact Pattern Conversion Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation Detecting Diffusion-generated Images via Dynamic Assembly Forests CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation Long-SCOPE: Fully Sparse Long-Range Cooperative 3D Perception Adding Another Dimension to Image-based Animal Detection Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search Through Their Eyes: Fixation-aligned Tuning for Personalized User Emulation FDIF: Formula-Driven supervised Learning with Implicit Functions for 3D Medical Image Segmentation B-MoE: A Body-Part-Aware Mixture-of-Experts "All Parts Matter" Approach to Micro-Action Recognition ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing COREY: Entropy-Guided Runtime Chunk Scheduling for Selective Scan Kernels Multinex: Lightweight Low-light Image Enhancement via Multi-prior Retinex Degradation-Consistent Paired Training for Robust AI-Generated Image Detection Genie 4D: Semantic-Prior-Guided 4D Dynamic Scene Reconstruction Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories Neural Distribution Prior for LiDAR Out-of-Distribution Detection Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS) SenBen: Sensitive Scene Graphs for Explainable Content Moderation Unsupervised Local Plasticity in a Multi-Frequency VisNet Hierarchy 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding On Semiotic-Grounded Interpretive Evaluation of Generative Art Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models Belief-Aware VLM Model for Human-like Reasoning Zero-Shot Quantization via Weight-Space Arithmetic Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning Fake-HR1: Rethinking Reasoning of Vision Language Model for Synthetic Image Detection MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics XD-MAP: Cross-Modal Domain Adaptation via Semantic Parametric Maps for Scalable Training Data Generation Measurement-Consistent Langevin Corrector for Stabilizing Latent Diffusion Inverse Problem Solvers What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving When & How to Write for Personalized Demand-aware Query Rewriting in Video Search Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling Relational Visual Similarity Interpretable Alzheimer's Diagnosis via Multimodal Fusion of Regional Brain Experts Enhancing Geo-localization for Crowdsourced Flood Imagery via LLM-Guided Attention Post-Processing Methods for Improving Accuracy in MRI Inpainting Geometry-Aware Cross Modal Alignment for Light Field-LiDAR Semantic Segmentation PnP-CM: Consistency Models as Plug-and-Play Priors for Inverse Problems KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation FedKLPR: KL-Guided Pruning-Aware Federated Learning for Person Re-Identification COXNet: Cross-Layer Fusion with Adaptive Alignment and Scale Integration for RGBT Tiny Object Detection AdvDINO: Domain-Adversarial Self-Supervised Representation Learning for Spatial Proteomics PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving Progressive Multimodal Interaction Network for Reliable Quantification of Fish Feeding Intensity in Aquaculture VRAG: Learning World Models for Interactive Video Generation GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping Variational Visual Question Answering for Uncertainty-Aware Selective Prediction Auto-regressive transformation for image alignment LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection AccidentSim: Generating Vehicle Collision Videos with Physically Realistic Collision Trajectories from Real-World Accident Reports Integrating Semi-Supervised and Active Learning for Semantic Segmentation HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks OmniPrism: Learning Disentangled Visual Concept for Image Generation Linear Attention Based Deep Nonlocal Means Filtering for Multiplicative Noise Removal MM-LIMA: Less Is More for Alignment in Multi-Modal Datasets SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions
CASR: A Robust Cyclic Framework for Arbitrary Large-Scale Super-Resolution with Distribution Alignment and Self-Similarity Awareness
Wenhao Guo, Zhaoran Zhao, Peng Lu, Sheng Li, Qian Qiao, DeRui Li · 2026-02-26 · via cs.CV updates on arXiv.org

Arbitrary-Scale SR (ASISR) remains fundamentally limited by cross-scale distribution shift: once the inference scale leaves the training range, noise, blur, and artifacts accumulate sharply. We revisit this challenge from a cross-scale distribution transition perspective and propose CASR, a simple yet highly efficient cyclic SR framework that reformulates ultra-magnification as a sequence of in-distribution scale transitions. This design ensures stable inference at arbitrary scales while requiring only a single model. CASR tackles two major bottlenecks: distribution drift across iterations and patch-wise diffusion inconsistencies. The proposed SSAM module aligns structural distributions via superpixel aggregation, preventing error accumulation, while SARM module restores high-frequency textures by enforcing correlation-guided consistency and preserving self-similarity structure through correlation alignment. Despite using only a single model, our approach significantly reduces distribution drift, preserves long-range texture consistency, and achieves superior generalization even at extreme magnification.