
























Given a string $S$ of length $N$ on a fixed alphabet of $σ$ symbols, a grammar compressor produces a context-free grammar $G$ of size $n$ that generates $S$ and only $S$. In this paper we describe data structures to support the following operations on a grammar-compressed string: $\mbox{rank}_c(S,i)$ (return the number of occurrences of symbol $c$ before position $i$ in $S$); $\mbox{select}_c(S,i)$ (return the position of the $i$th occurrence of $c$ in $S$); and $\mbox{access}(S,i,j)$ (return substring $S[i,j]$). For rank and select we describe data structures of size $O(nσ\log N)$ bits that support the two operations in $O(\log N)$ time. We propose another structure that uses $O(nσ\log (N/n)(\log N)^{1+ε})$ bits and that supports the two queries in $O(\log N/\log\log N)$, where $ε>0$ is an arbitrary constant. To our knowledge, we are the first to study the asymptotic complexity of rank and select in the grammar-compressed setting, and we provide a hardness result showing that significantly improving the bounds we achieve would imply a major breakthrough on a hard graph-theoretical problem. Our main result for access is a method that requires $O(n\log N)$ bits of space and $O(\log N+m/\log_σN)$ time to extract $m=j-i+1$ consecutive symbols from $S$. Alternatively, we can achieve $O(\log N/\log\log N+m/\log_σN)$ query time using $O(n\log (N/n)(\log N)^{1+ε})$ bits of space. This matches a lower bound stated by Verbin and Yu for strings where $N$ is polynomially related to $n$.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。