





















Abstract:We introduce FloorplanQA, a diagnostic benchmark for evaluating spatial reasoning in large language models (LLMs). FloorplanQA is grounded in structured representations of indoor scenes, such as (e.g., kitchens, living rooms, bedrooms, bathrooms, and others), encoded symbolically in JSON or XML layouts. The benchmark covers core spatial tasks, including distance measurement, visibility, path finding, and object placement within constrained spaces. Our results across a variety of frontier open-source and commercial LLMs reveal that while models may succeed in shallow queries, they often fail to respect physical constraints, preserve spatial coherence, though they remain mostly robust to small spatial perturbations. FloorplanQA uncovers a blind spot in today's LLMs: inconsistent reasoning about indoor layouts. We hope this benchmark inspires new work on language models that can accurately infer and manipulate spatial and geometric properties in practical settings.
| Comments: | ICML 2026, Project page: this https URL |
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2507.07644 [cs.AI] |
| (or arXiv:2507.07644v4 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2507.07644 arXiv-issued DOI via DataCite |
From: Fedor Rodionov [view email]
[v1]
Thu, 10 Jul 2025 11:16:48 UTC (6,333 KB)
[v2]
Mon, 6 Oct 2025 12:00:21 UTC (2,339 KB)
[v3]
Fri, 30 Jan 2026 12:57:19 UTC (2,870 KB)
[v4]
Mon, 25 May 2026 12:09:37 UTC (4,300 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。