惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Recent Announcements
Recent Announcements
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
O
OpenAI News
D
Docker
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
N
Netflix TechBlog - Medium
人人都是产品经理
人人都是产品经理
Y
Y Combinator Blog
M
MIT News - Artificial intelligence
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
博客园 - 司徒正美
C
CXSECURITY Database RSS Feed - CXSecurity.com
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
Security Latest
Security Latest
T
Tailwind CSS Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
V
Vulnerabilities – Threatpost
W
WeLiveSecurity
N
News and Events Feed by Topic
aimingoo的专栏
aimingoo的专栏
美团技术团队
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Google DeepMind News
Google DeepMind News
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
C
Cyber Attacks, Cyber Crime and Cyber Security
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
B
Blog
T
The Blog of Author Tim Ferriss
Google DeepMind News
Google DeepMind News
Help Net Security
Help Net Security
爱范儿
爱范儿
宝玉的分享
宝玉的分享
腾讯CDC
H
Heimdal Security Blog
Webroot Blog
Webroot Blog
AI
AI
WordPress大学
WordPress大学
Recorded Future
Recorded Future
SecWiki News
SecWiki News
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Security Archives - TechRepublic
Security Archives - TechRepublic
Google Online Security Blog
Google Online Security Blog
C
Check Point Blog
TaoSecurity Blog
TaoSecurity Blog
Cisco Talos Blog
Cisco Talos Blog
The Cloudflare Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
博客园 - Franky
云风的 BLOG
云风的 BLOG

LWN.net comments

tcmalloc's weird hack [LWN.net] Fixed? [LWN.net] mpd [LWN.net] Userspace AX.25 [LWN.net] RIP [LWN.net] My two cents... [LWN.net] pipx [LWN.net] Tragedy [LWN.net] A young man destined for glory [LWN.net] And 'less' won't let you search [LWN.net] A great loss [LWN.net] Sad and shocking news [LWN.net] Easy migration from Clementine [LWN.net] Sad coincidence [LWN.net] GNOME is actually usable thanks to Seth et al [LWN.net] Sad news :( [LWN.net] armhf supports preempt_rt [LWN.net] MusicBrainz accurracy [LWN.net] On open source maintainership [LWN.net] Let's stop here [LWN.net] Not a new thing [LWN.net] uv is indeed great pgmoneta Some comments on this on a Postgres blog feed [LWN.net] uv [LWN.net] going to Debian [LWN.net] Upgrading 64-bit-capable systems to 64-bit kernels? [LWN.net] Free Software foundations Maintainers can wait for code review but not for publish review? A reasonably extreme point of view [LWN.net] Maintaining old code Varieties of filesystems and schedulers, so why not for IPC mechanisms too? [LWN.net] AI and documentation [LWN.net] Delegating the work to a subsidiary [LWN.net] Maybe they should provide their reviews to the world [LWN.net] Something can be a bug but not a vulnerability [LWN.net] History is a little backwards ... [LWN.net] A reasonably extreme point of view [LWN.net] Let’s stop here [LWN.net] authd [LWN.net] Suggestion for bug report [LWN.net] Software pain points for long-term equipment [LWN.net] Wrong direction [LWN.net] mjg59 has lost the plot there [LWN.net] Role of German law in this? [LWN.net] Without beer? [LWN.net] Feels soul destroying [LWN.net] No zswap in Debian cloud kernel [LWN.net] No Beer?!? [LWN.net] The other fam [LWN.net] Thank you Andrew [LWN.net] Brave! [LWN.net] I second the cost factor [LWN.net] cassandra [LWN.net] Proprietary tools [LWN.net] familiar [LWN.net] ... is also staging. [LWN.net] Python package managers [LWN.net] Pour one out for AX.25... [LWN.net] tun/tap? [LWN.net] Another article at gnulinux.ch [LWN.net] Transitive checks [LWN.net] Just execute from stdin [LWN.net] Cross-compile Vacation [LWN.net] Concrete steps toward RFC 3550 (new Range types) You can rip with Windows apps too! Have the tempfile issues raised in the release notes been fixed? onlyoffice tried to add stuff in the fine print, and failed Work w/o publication is not science Removing art like offensive fortunes is a mistake. [LWN.net] De-googling (was Wtf) [LWN.net] I liked pdfmark [LWN.net] Juice then tag [LWN.net] why did PREEMPT_LAZY caused more preemptions than PREEMPT_NONE with THP disabled? [LWN.net] x86-64 was first introduced in 2003 [LWN.net] no memory safety? [LWN.net] False positive identification rate [LWN.net] "Defensive" AI use [LWN.net] LTS release? [LWN.net] ironic (ugly, good) [LWN.net] Moving away from LLVM [LWN.net] ironic (ugly, good) [LWN.net] Abandoning vim(1) ASAP [LWN.net] "Picard" naming [LWN.net] circular reasoning is a potential source of unsoundness [LWN.net] Nice to see an update [LWN.net] Writable THPs [LWN.net] Whole network messages [LWN.net] I'll fix my code ... [LWN.net] Can also recommend beets [LWN.net] Jack the CD ripper [LWN.net] How about the bad CDs? [LWN.net] systemd-boot [LWN.net] Significant raise of reports [LWN.net] IMO, it's appropriate [LWN.net] How about the bad CDs? [LWN.net] Update to include Part 4? [LWN.net] Pandoc also is invauable for a cheap-and-dirty retrieval augmented generation. [LWN.net] Whole network messages [LWN.net]
Rationale for the pseudo-random threshold? [LWN.net]
Wol · 2026-05-05 · via LWN.net comments

Rationale for the pseudo-random threshold?

Posted May 5, 2026 14:54 UTC (Tue) by Wol (subscriber, #4433)
In reply to: Rationale for the pseudo-random threshold? by marcH
Parent article: Version-controlled databases using Prolly trees
to post comments

Rationale for the pseudo-random threshold?

Posted May 5, 2026 22:12 UTC (Tue) by marcH (subscriber, #57642) [Link] (6 responses)

I'm not wondering about hashes. I'm wondering about the _threshold_ hashes are compared against. Why and how that threshold is variable. Hashes should be "variable enough" already, no? I most likely missed something.

Rationale for the pseudo-random threshold?

Posted May 6, 2026 7:00 UTC (Wed) by daroc (editor, #160859) [Link] (5 responses)

The threshold is variable to change the distribution of node sizes. If you use a fixed threshold, you get a lopsided distribution where the mean node is half-full but the modal node only has one or two elements. It's more efficient to have a low threshold for earlier items and a higher threshold for later items so that you can make the distribution of node sizes follow a bell curve.

Rationale for the pseudo-random threshold?

Posted May 6, 2026 14:05 UTC (Wed) by Wol (subscriber, #4433) [Link] (4 responses)

Rationale for the pseudo-random threshold?

Posted May 6, 2026 15:52 UTC (Wed) by farnz (subscriber, #17727) [Link]

While a good hash should have a flat distribution across all possible inputs (because all output values are equally likely), it should not have a mode at all (since all values are equally likely), and thus will not form Gaussian noise automatically (the form of noise that gives you a bell curve, where the mode is well-defined and equal to the mean). Instead, the distribution of noise you get by using a hash function is an artefact of the data you supplied to the hash function; for non-adversarial inputs (e.g. where you've keyed the hash so an attacker can't control the input to the hash function), the mean should be approximately half the maximum value, but the distribution is still unknown.

Because the hash distribution is an artefact of the input data, you need some way to ensure that an unfortunate distribution of hashes won't result in nodes that are consistently too large or too small. The simplest way to do this reliably is to vary the threshold for the next node split point based on the size of previous nodes - if the average node so far has been too small, adjust the threshold to get larger nodes in future, while if the average node has been too large, adjust the threshold to get smaller nodes in future.

More complex strategies also exist for setting the threshold - I've not looked to see how Dolt handles this - but the key to why you need a varying threshold is simply that the shape of the distribution of hash values is unknown.

Rationale for the pseudo-random threshold?

Posted May 6, 2026 15:56 UTC (Wed) by daroc (editor, #160859) [Link] (2 responses)

Rationale for the pseudo-random threshold?

Posted May 6, 2026 23:01 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

Rationale for the pseudo-random threshold?

Posted May 15, 2026 19:33 UTC (Fri) by zachmu (guest, #183605) [Link]