

















Abstract:This article introduces two new measures for authorship attribution - Rank-Turbulence Delta and Jensen-Shannon Delta - which generalise Burrows's classical Delta by applying distance functions designed for probabilistic distributions. We first set out the theoretical basis of the measures, contrasting centred and uncentred z-scoring of word-frequency vectors and re-casting the uncentred vectors as probability distributions. Building on this representation, we develop a token-level decomposition that renders every Delta distance numerically interpretable, thereby facilitating close reading and the validation of results. The effectiveness of the methods is assessed on four literary corpora in English, German, French and Russian. The English, German and French datasets are compiled from Project Gutenberg, whereas the Russian benchmark is the SOCIOLIT corpus containing 639 works by 89 authors spanning the eighteenth to the twenty-first centuries. Rank-Turbulence Delta attains attribution accuracy comparable with Cosine Delta; Jensen-Shannon Delta consistently matches or exceeds the performance of canonical Burrows's Delta. Finally, several established attribution algorithms are re-evaluated on the extended SOCIOLIT corpus, providing a realistic estimate of their robustness under pronounced temporal and stylistic variation.
| Comments: | Published in Digital Scholarship in the Humanities. The version of record is available at this https URL Code available at: this https URL |
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2604.19499 [cs.CL] |
| (or arXiv:2604.19499v4 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2604.19499 arXiv-issued DOI via DataCite |
|
| Journal reference: | Digital Scholarship in the Humanities, 2026 |
| Related DOI: | https://doi.org/10.1093/llc/fqag072
DOI(s) linking to related resources |
From: Dmitry Pronin [view email]
[v1]
Tue, 21 Apr 2026 14:20:34 UTC (1,872 KB)
[v2]
Wed, 22 Apr 2026 08:44:33 UTC (1,872 KB)
[v3]
Sun, 26 Apr 2026 09:35:26 UTC (1,870 KB)
[v4]
Mon, 25 May 2026 19:02:29 UTC (1,870 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。