























Abstract:LLM-curated hierarchical knowledge bases, namely a tree-structured wiki whose nodes summarize an underlying corpus, have become a dominant substrate for retrieval-augmented applications, yet their storage layer is still treated as an implementation detail. This workload is hierarchical, query-intensive, and continuously evolving, and no existing storage model natively captures all three properties at once. We present WikiKV, a path-indexed key-value storage model purpose-built for this workload, comprising three components: (i) a data-driven schema that bootstraps the hierarchy via Intent-Anchored Schema Induction and refines it through Continuous Evolution Operators; (ii) a consistency protocol for the path-indexed storage model that precludes partial-read observations under concurrent offline rewrites without read-path locking; and (iii) a budgeted navigation operator whose search-accelerated routing reduces the expected number of LLM-assisted descent steps from d to O(1) while preserving anytime semantics with progressively refined answers. We evaluate WikiKV through real-world deployment for the WeChat Official Account AI Assistant and benchmark it against diverse baselines on the AuthTrace dataset, where it achieves balanced low per-operator latency across four query operators against relational, graph, and FS backends, and reaches 63.2% end-to-end answer correctness, exceeding multiple RAG baselines, with the gap widening on low- and high-fan-in multi-document questions. Ablation study further confirms the effectiveness of WikiKV's components.
From: Haoliang Ming [view email]
[v1]
Fri, 12 Jun 2026 08:58:31 UTC (545 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。