





















Abstract:Approximate functional dependencies (AFDs) relax exact functional dependencies by tolerating a bounded degree of violation, making them suited for data quality auditing. Threshold-based discovery returns all dependencies above a user-specified cutoff, but output size is uncontrollable, the right threshold varies across datasets, and widely used measures are sensitive to LHS dimensionality. We study global top-$k$ AFD discovery, where neither the LHS nor the RHS is fixed and the $k$ strongest dependencies under $\mu^+$ are returned directly. The cross-attribute comparability of $\mu^+$ makes such a global ranking well-defined. We prove a Triangle Incompatibility Theorem showing that minimality, global top-$k$ ranking, and exact-$k$ output cannot simultaneously hold under any non-monotonic scoring function, justifying the removal of the minimality requirement. We present two algorithms: TALE-Base, which returns the exact global top-$k$ result by exhaustive level-wise evaluation, and TALE-Opt, which reduces computation through Apriori-style candidate generation, LHS computation reuse, and two complementary pruning rules exploiting exact FD monotonicity and an optimistic upper bound on $\mu^+$. Experiments on 41 real-world datasets show that TALE-Opt achieves pruning ratios up to 99.81\% and speedups over TALE-Base up to 78.81$\times$.
| Subjects: | Databases (cs.DB) |
| Cite as: | arXiv:2605.24925 [cs.DB] |
| (or arXiv:2605.24925v1 [cs.DB] for this version) | |
| https://doi.org/10.48550/arXiv.2605.24925 arXiv-issued DOI via DataCite (pending registration) |
From: Xixian Han [view email]
[v1]
Sun, 24 May 2026 08:04:01 UTC (40 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。