
























Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets $X$ are sampled randomly from a domain $Ω$, equipped with a distance, $ρ$, and an underlying probability distribution, $μ$. While performing an asymptotic analysis, we send the intrinsic dimension $d$ of $Ω$ to infinity, and assume that the size of a dataset, $n$, grows superpolynomially yet subexponentially in $d$. Exact similarity search refers to finding the nearest neighbour in the dataset $X$ to a query point $ω\inΩ$, where the query points are subject to the same probability distribution $μ$ as datapoints. Let $\mathscr F$ denote a class of all 1-Lipschitz functions on $Ω$ that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of the class of all sets $\{ω\colon f(ω)\geq a\}$, $a\in\R$ is $o(n^{1/4}/\log^2n)$. (In view of a 1995 result of Goldberg and Jerrum, even a stronger complexity assumption $d^{O(1)}$ is reasonable.) We deduce the $Ω(n^{1/4})$ lower bound on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in $(Ω,X)$. In paricular, this bound is superpolynomial in $d$.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。