

















Abstract:Open-vocabulary object detection (OVD) has made significant progress, enabling detectors to generalize from seen to unseen categories. However, real-world category spaces continually evolve, and existing OVD models still struggle with newly emerging concepts, while repeated full retraining is prohibitively expensive. To this end, we introduce a new task setting, termed Continual OVD with Novel Concept Injection (COVD), where models sequentially learn incoming novel concept groups while preserving prior concepts and original open-vocabulary knowledge, along with a new benchmark, Novel-114. Our key observation is that pretrained visual encoders often already perceive and represent many novel concepts, and the main bottleneck lies in the lack of stable semantic alignment between visual representations and textual concepts. Based on this, we propose NoIn-Det, an efficient continual injection framework without additional parameters. NoIn-Det freezes the visual encoder, preserves the text representation space using only texts of common concepts and previously injected concepts, and injects novel concepts by updating only a small subset of text-branch parameters beneficial to novel concept learning. Extensive experiments show that NoIn-Det effectively learns novel concepts, preserves old knowledge, and consistently outperforms existing continual learning methods for VLMs without introducing additional this http URL-114 and the code will be released.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.27116 [cs.CV] |
| (or arXiv:2605.27116v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.27116 arXiv-issued DOI via DataCite (pending registration) |
From: Yupeng Zhang [view email]
[v1]
Tue, 26 May 2026 14:53:14 UTC (1,699 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。