

















Abstract:This paper presents a comparative evaluation of convolutional and transformer-based object detection architectures for early weed detection in tomato plantations. Representative models from each paradigm are considered, including YOLOv26-nano, a recent variant of the YOLO family, and RT-DETR Large and RF-DETR Medium as transformer-based architectures. The evaluation was conducted on the GROUNDBASED_WEED dataset, considering six weed classes and an additional category corresponding to unidentified plants, which allowed for the assessment of performance in terms of detection accuracy and computational efficiency using metrics such as precision, recall, average precision, and inference speed, as well as non-parametric statistical tests. The results highlight a clear trade-off between efficiency and contextual modeling: CNN-based detectors achieve high performance at a lower computational cost, while transformer-based approaches offer better global context capture at the expense of higher resource demands. These results provide practical criteria for model selection in precision agriculture applications.
| Comments: | 7 pages, 3 figures, and 1 table |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.00908 [cs.CV] |
| (or arXiv:2605.00908v2 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.00908 arXiv-issued DOI via DataCite |
From: Alcides Toledo Espinosa [view email]
[v1]
Wed, 29 Apr 2026 08:23:28 UTC (1,241 KB)
[v2]
Fri, 22 May 2026 19:09:23 UTC (16,038 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。