Abstract
This paper focuses on the task of answering complex visual questions that involve cross-dimensional (like 2D to 3D) spatial reasoning. This task (called SpatialQA) can enhance the machine’s spatial cognitive abilities in "plane representation - space reconstruction - semantic inference," having great application value. Existing methods often only recognize 1-D visual objects and relations, but they lack the ability to represent in a cross-dimensional space and fail to grasp structured geometric knowledge such as face-face topology and texture details. That would cause problems such as texture misalignment and topological confusion, leading to error accumulation and incorrect answers. To address this problem, we propose a new method with good cross-dimensional reasoning capabilities. In detail, we first analyze the input image, capturing its relations in the 2D plane. To derive the topological relations in the 3D space, we employ a dual-channel augmentation technique to retrieve topological isomorphic examples and geometric rules, supplementing the missing but crucial reasoning clues. We then design a multi-perspective verifier to find the inconsistencies of the macroscopic outlines, eliminating incorrect options. Based on visual clues, we develop a question-guided detector to analyze the texture details and relations of each surface finely, capturing inconsistencies in a micro level. That can correct the reasoning bias to derive the right answer. Moreover, we create a large-scale dataset with 22,483 samples to conduct evaluations. The results show the effectiveness of our method.
- Anthology ID:
- 2026.findings-acl.1656
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 33093–33111
- Language:
- URL:
- https://aclanthology.org/2026.findings-acl.1656/
- DOI:
- Bibkey:
- Cite (ACL):
- Dongling Li, Qi Chen, Jianxing Yu, Hanjiang Lai, Yanghui Rao, Wenqing Chen, and Jian Yin. 2026. Answering Cross-Dimensional Geometric Visual Questions by Multi-constraint Spatial Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 33093–33111, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Answering Cross-Dimensional Geometric Visual Questions by Multi-constraint Spatial Reasoning (Li et al., Findings 2026)
- Copy Citation:
- PDF:
- https://aclanthology.org/2026.findings-acl.1656.pdf
- Checklist:
- 2026.findings-acl.1656.checklist.pdf
























