惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

SecWiki News
SecWiki News
D
Darknet – Hacking Tools, Hacker News & Cyber Security
I
Intezer
月光博客
月光博客
Cyberwarzone
Cyberwarzone
雷峰网
雷峰网
Security Latest
Security Latest
量子位
博客园 - 聂微东
小众软件
小众软件
NISL@THU
NISL@THU
C
Cisco Blogs
The GitHub Blog
The GitHub Blog
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Y
Y Combinator Blog
V
V2EX
博客园 - 三生石上(FineUI控件)
P
Privacy & Cybersecurity Law Blog
F
Full Disclosure
Cisco Talos Blog
Cisco Talos Blog
Microsoft Security Blog
Microsoft Security Blog
S
Security @ Cisco Blogs
The Register - Security
The Register - Security
Google DeepMind News
Google DeepMind News
J
Java Code Geeks
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
IT之家
IT之家
Webroot Blog
Webroot Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
aimingoo的专栏
aimingoo的专栏
腾讯CDC
S
Schneier on Security
L
LINUX DO - 最新话题
Latest news
Latest news
Simon Willison's Weblog
Simon Willison's Weblog
罗磊的独立博客
A
Arctic Wolf
MyScale Blog
MyScale Blog
云风的 BLOG
云风的 BLOG
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
S
Secure Thoughts
S
Securelist
Stack Overflow Blog
Stack Overflow Blog
T
Troy Hunt's Blog
Recorded Future
Recorded Future
I
InfoQ
The Cloudflare Blog
H
Heimdal Security Blog
Hugging Face - Blog
Hugging Face - Blog

Ethereum Foundation Blog

Checkpoint #9: Apr 2026 | Ethereum Foundation Blog How L1 and L2s can build the strongest possible Ethereum | Ethereum Foundation Blog The Promise of Ethereum: Introducing the EF Mandate | Ethereum Foundation Blog This Is Fine (Until the Grant Runs Out) | Ethereum Foundation Blog Treasury Staking Initiative | Ethereum Foundation Blog The Ethereum Foundation's Commitment to DeFi | Ethereum Foundation Blog Protocol Priorities Update for 2026 | Ethereum Foundation Blog Announcing the Platform Team at EF | Ethereum Foundation Blog Ethereum Protocol Studies 2026 | Ethereum Foundation Blog Executive Leadership Update | Ethereum Foundation Blog An update from Tomasz | Ethereum Foundation Blog Introducing the EF Academic Secretariat 2026 PhD Fellowship | Ethereum Foundation Blog Trillion Dollar Security Day at Devconnect | Ethereum Foundation Blog Allocation Update - Q4 2025 | Ethereum Foundation Blog Checkpoint #8: Jan 2026 | Ethereum Foundation Blog Devcon 8 is coming to Mumbai, India in November 2026 | Ethereum Foundation Blog Hegota Upgrade EIP Proposal Timelines | Ethereum Foundation Blog Shipping an L1 zkEVM #2: The Security Foundations | Ethereum Foundation Blog The Future of Ethereum’s State | Ethereum Foundation Blog Devconnect Argentina Recap | Ethereum Foundation Blog Allocation Update - Q3 2025 | Ethereum Foundation Blog Making Ethereum Feel Like One Chain Again | Ethereum Foundation Blog Checkpoint #7: Nov 2025 | Ethereum Foundation Blog Fusaka Mainnet Announcement | Ethereum Foundation Blog 2 weeks to Devconnect: Everything you need to know | Ethereum Foundation Blog Unveiling ESP's New Grants Program | Ethereum Foundation Blog Fusaka Update – Transaction Gas Limit Cap arrives with EIP-7825 | Ethereum Foundation Blog Fusaka Update - Information for Blob users | Ethereum Foundation Blog Announcing the 2026 EF Internship | Ethereum Foundation Blog Supporting privacy with new funding mechanisms | Ethereum Foundation Blog The Ethereum Foundation’s Commitment to Privacy | Ethereum Foundation Blog Checkpoint #6: Oct 2025 | Ethereum Foundation Blog Privacy Cluster Leadership Announcement | Ethereum Foundation Blog Fusaka Testnet Announcement | Ethereum Foundation Blog Announcing the districts of the Ethereum World’s Fair | Ethereum Foundation Blog Fusaka $2,000,000 Audit Contest! | Ethereum Foundation Blog Holešky Testnet Shutdown Announcement | Ethereum Foundation Blog The Ecosystem Support Program's Next Chapter | Ethereum Foundation Blog Protocol Update 003 — Improve UX | Ethereum Foundation Blog Protocol Update 002 - Scale Blobs | Ethereum Foundation Blog Trillion Dollar Security - Phase 2 | Ethereum Foundation Blog Join Us: EF Protocol Reddit AMA - August 29th, 2025 | Ethereum Foundation Blog Protocol Update 001 – Scale L1 | Ethereum Foundation Blog lean Ethereum | Ethereum Foundation Blog Celebrating 10 Years of Ethereum | Ethereum Foundation Blog Checkpoint #5: July 2025 | Ethereum Foundation Blog Allocation Update - Q2 2025 | Ethereum Foundation Blog The Future of Ecosystem Development at the EF | Ethereum Foundation Blog Shipping an L1 zkEVM #1: Realtime Proving | Ethereum Foundation Blog Partial history expiry announcement | Ethereum Foundation Blog Checkpoint #4: Berlinterop | Ethereum Foundation Blog World Experience: Updates from the Next Billion Fellowship | Ethereum Foundation Blog Now accepting interns - Join the Ethereum Season of Internships | Ethereum Foundation Blog Tickets are live for the Ethereum World’s Fair! And we're launching the Supporter Program | Ethereum Foundation Blog Ethereum Foundation Treasury Policy | Ethereum Foundation Blog Checkpoint #3: June 2025 | Ethereum Foundation Blog Announcing the Devconnect ARG Scholars Program | Ethereum Foundation Blog Announcing Protocol | Ethereum Foundation Blog Nyota Interop Recap ✨ | Ethereum Foundation Blog Allocation Update - Q1 2024 | Ethereum Foundation Blog Announcing the Ethereum Protocol Fellowship Cohort 5 | Ethereum Foundation Blog Ethereum Protocol Fellowship Cohort 4 Recap | Ethereum Foundation Blog Sepolia Incident | Ethereum Foundation Blog Announcing the Devcon SEA venue! | Ethereum Foundation Blog Devconnect Scholars Program - Ethereum Stories from Istanbul and Beyond | Ethereum Foundation Blog Dencun Mainnet Announcement | Ethereum Foundation Blog ZK Grants Round | Ethereum Foundation Blog Eth2 at ETHWaterloo: Prizes for Eth2 education, tooling, and research | Ethereum Foundation Blog eth2 quick update no. 2 | Ethereum Foundation Blog Devcon4 Ticket Sales | Ethereum Foundation Blog Announcing Swarm Proof-of-Concept Release 3 | Ethereum Foundation Blog Devcon4 Announcement | Ethereum Foundation Blog Announcing May 2018 Cohort of EF Grants | Ethereum Foundation Blog Announcing World Trade Francs: The Official Ethereum Stablecoin | Ethereum Foundation Blog Announcing Beneficiaries of the Ethereum Foundation Grants | Ethereum Foundation Blog Geth 1.8 - Iceberg¹ | Ethereum Foundation Blog Farewell and Welcome | Ethereum Foundation Blog Security Alert - Solidity - Variables can be overwritten in storage | Ethereum Foundation Blog Uncle Rate and Transaction Fee Analysis | Ethereum Foundation Blog Announcement of imminent hard fork for EIP150 gas cost changes | Ethereum Foundation Blog Dev Update: Formal Methods | Ethereum Foundation Blog On Inflation, Transaction Fees and Cryptocurrency Monetary Policy | Ethereum Foundation Blog Onward from the Hard Fork | Ethereum Foundation Blog C++ DEV Update - July edition | Ethereum Foundation Blog The Devcon2 site is now live! | Ethereum Foundation Blog Security Alert - DoS Vulnerability in the Soft Fork | Ethereum Foundation Blog DAO Wars: Your voice on the soft-fork dilemma | Ethereum Foundation Blog Smart Contract Security | Ethereum Foundation Blog Security Alert – Geth suffers from a very low probable DoS attack vector - Update immediately | Ethereum Foundation Blog On Settlement Finality | Ethereum Foundation Blog Ethereum Foundation and Wanxiang Blockchain Labs announce a blockbuster event combining Devcon2 and the 2nd Global Blockchain Summit in Shanghai, September 19–24, 2016 | Ethereum Foundation Blog Ethereum Partners with R3CEV on Lizardcoin, Bringing Together the Best of Centralized Finance and Blockchain Technology | Ethereum Foundation Blog From Smart Contracts to Courts with not so Smart Judges | Ethereum Foundation Blog BTC Relay included in Ethereum Bounty Program | Ethereum Foundation Blog Ethereum DEV Update: C++ Roadmap | Ethereum Foundation Blog Cut and try: building a dream | Ethereum Foundation Blog Ambients Applied to Ethereum | Ethereum Foundation Blog Mihai’s Ethereum Project Update. The First Year. | Ethereum Foundation Blog Getting to the Frontier | Ethereum Foundation Blog The Ethereum Development Process | Ethereum Foundation Blog
Ask about Geth: Snapshot acceleration | Ethereum Foundation Blog
2020-07-17 · via Ethereum Foundation Blog

*This is part #1 of a series where anyone can ask questions about Geth and I'll attempt to answer the highest voted one each week with a mini writeup. This week's highest voted question was: Could you share how the flat db structure is different from the legacy structure?*

State in Ethereum

Before diving into an acceleration structure, let's recap a bit what Ethereum calls state and how it is stored currently at its various levels of abstraction.

Ethereum maintains two different types of state: the set of accounts; and a set of storage slots for each contract account. From a purely abstract perspective, both of these are simple key/value mappings. The set of accounts maps addresses to their nonce, balance, etc. A storage area of a single contract maps arbitrary keys - defined and used by the contract - to arbitrary values.

Unfortunately, whilst storing these key-value pairs as flat data would be very efficient, verifying their correctness becomes computationally intractable. Every time a modification would be made, we'd need to hash all that data from scratch.

Instead of hashing the entire dataset all the time, we could split it up into small contiguous chunks and build a tree on top! The original useful data would be in the leaves, and each internal node would be a hash of everything below it. This would allow us to only recalculate a logarithmic number of hashes when something is modified. This data structure actually has a name, it's the famous Merkle tree.

Unfortunately, we still fall a bit short on the computational complexity. The above Merkle tree layout is very efficient at incorporating modifications to existing data, but insertions and deletions shift the chunk boundaries and invalidate all the calculated hashes.

Instead of blindly chunking up the dataset, we could use the keys themselves to organize the data into a tree format based on common prefixes! This way an insertion or deletion wouldn't shift all nodes, rather will change just the logarithmic path from root to leaf. This data structure is called a Patricia tree.

Combine the two ideas - the tree layout of a Patricia tree and the hashing algorithm of a Merkle tree - and you end up with a Merkle Patricia tree, the actual data structure used to represent state in Ethereum. Guaranteed logarithmic complexity for modifications, insertions, deletions and verification! A tiny extra is that keys are hashed before insertion to balance the tries.

State storage in Ethereum

The above description explains why Ethereum stores its state in a Merkle Patricia tree. Alas, as fast as the desired operations got, every choice is a trade-off. The cost of logarithmic updates and logarithmic verification is logarithmic reads and logarithmic storage for every individual key. This is because every internal trie node needs to be saved to disk individually.

I do not have an accurate number for the depth of the account trie at the moment, but about a year ago we were saturating the depth of 7. This means, that every trie operation (e.g. read balance, write nonce) touches at least 7-8 internal nodes, thus will do at least 7-8 persistent database accesses. LevelDB also organizes its data into a maximum of 7 levels, so there's an extra multiplier from there. The net result is that a single state access is expected to amplify into 25-50 random disk accesses. Multiply this with all the state reads and writes that all the transactions in a block touch and you get to a scary number.

[Of course all client implementations try their best to minimize this overhead. Geth uses large memory areas for caching trie nodes; and also uses in-memory pruning to avoid writing to disk nodes that get deleted anyway after a few blocks. That's for a different blog post however.]

As horrible as these numbers are, these are the costs of operating an Ethereum node and having the capability of cryptograhically verifying all state at all times. But can we do better?

Not all accesses are created equal

Ethereum relies on cryptographic proofs for its state. There is no way around the disk amplifications if we want to retain our capability to verify all the data. That said, we can - and do - trust the data we've already verified.

There is no point to verify and re-verify every state item, every time we pull it up from disk. The Merkle Patricia tree is essential for writes, but it's an overhead for reads. We cannot get rid of it, and we cannot slim it down; but that doesn't mean we must necessarily use it everywhere.

An Ethereum node accesses state in a few different places:

  • When importing a new block, EVM code execution does a more-or-less balanced number of state reads and writes. A denial-of-service block might however do significantly more reads than writes.
  • When a node operator retrieves state (e.g. eth_call and family), EVM code execution only does reads (it can write too, but those get discarded at the end and are not persisted).
  • When a node is synchronizing, it is requesting state from remote nodes that need to dig it up and serve it over the network.

Based on the above access patterns, if we can short circuit reads not to hit the state trie, a slew of node operations will become significantly faster. It might even enable some novel access patterns (like state iteration) which was prohibitively expensive before.

Of course, there's always a trade-off. Without getting rid of the trie, any new acceleration structure is extra overhead. The question is whether the additional overhead provides enough value to warrant it?

Back to the roots

We've built this magical Merkle Patricia tree to solve all our problems, and now we want to get around it for reads. What acceleration structure should we use to make reads fast again? Well, if we don't need the trie, we don't need any of the complexity introduced. We can go all the way back to the origins.

As mentioned in the beginning of this post, the theoretical ideal data storage for Ethereum's state is a simple key-value store (separate for accounts and each contract). Without the constraints of the Merkle Patricia tree however, there's "nothing" stopping us from actually implementing the ideal solution!

A while back Geth introduced its snapshot acceleration structure (not enabled by default). A snapshot is a complete view of the Ethereum state at a given block. Abstract implementation wise, it is a dump of all accounts and storage slots, represented by a flat key-value store.

Whenever we wish to access an account or storage slot, we only pay 1 LevelDB lookup instead of 7-8 as per the trie. Updating the snapshot is also simple in theory, after processing a block we do 1 extra LevelDB write per updated slot.

The snapshot essentially reduces reads from O(log n) to O(1) (times LevelDB overhead) at the cost of increasing writes from O(log n) to O(1 + log n) (times LevelDB overhead) and increasing disk storage from O(n log n) to O(n + n log n).

Devil's in the details

Maintaining a usable snapshot of the Ethereum state has its complexity. As long as blocks are coming one after the other, always building on top of the last, the naive approach of merging changes into the snapshot works. If there's a mini reorg however (even a single block), we're in trouble, because there's no undo. Persistent writes are one-way operation for a flat data representation. To make matters worse, accessing older state (e.g. 3 blocks old for some DApp or 64+ for fast/snap sync) is impossible.

To overcome this limitation, Geth's snapshot is composed of two entities: a persistent disk layer that is a complete snapshot of an older block (e.g. HEAD-128); and a tree of in-memory diff layers that gather the writes on top.

Whenever a new block is processed, we do not merge the writes directly into the disk layer, rather just create a new in-memory diff layer with the changes. If enough in-memory diff layers are piled on top, the bottom ones start getting merged together and eventually pushed to disk. Whenever a state item is to be read, we start at the topmost diff layer and keep going backwards until we find it or reach the disk.

This data representation is very powerful as it solves a lot of issues. Since the in-memory diff layers are assembled into a tree, reorgs shallower than 128 blocks can simply pick the diff layer belonging to the parent block and build forward from there. DApps and remote syncers needing older state have access to 128 recent ones. The cost does increase by 128 map lookups, but 128 in-memory lookups is orders of magnitude faster than 8 disk reads amplified 4x-5x by LevelDB.

Of course, there are lots and lots of gotchas and caveats. Without going into too much details, a quick listing of the finer points are:

  • Self-destructs (and deletions) are special beasts as they need to short circuit diff layer descent.
  • If there is a reorg deeper than the persistent disk layer, the snapshot needs to be completely discarded and regenerated. This is very expensive.
  • On shutdown, the in-memory diff layers need to be persisted into a journal and loaded back up, otherwise the snapshot will become useless on restart.
  • Use the bottom-most diff layer as an accumulator and only flush to disk when it exceeds some memory usage. This allows deduping writes for the same slots across blocks.
  • Allocate a read cache for the disk layer so that contracts accessing the same ancient slot over and over don't cause disk hits.
  • Use cumulative bloom filters in the in-memory diff layers to quickly detect whether there's a chance for an item to be in the diffs, or if we can go to disk immediately.
  • The keys are not the raw data (account address, storage key), rather the hashes of these, ensuring the snapshot has the same iteration order as the Merkle Patricia tree.
  • Generating the persistent disk layer takes significantly more time than the pruning window for the state tries, so even the generator needs to dynamically follow the chain.

The good, the bad, the ugly

Geth's snapshot acceleration structure reduces state read complexity by about an order of magnitude. This means read-based DoS gets an order of magnitude harder to pull off; and eth_call invocations get an order of magnitude faster (if not CPU bound).

The snapshot also enables blazing fast state iteration of the most recent blocks. This was actually the main reason for building snapshots, as it permitted the creation of the new snap sync algorithm. Describing that is an entirely new blog post, but the latest benchmarks on Rinkeby speak volumes:

Rinkeby snap sync

Of course, the trade-offs are always present. After initial sync is complete, it takes about 9-10h on mainnet to construct the initial snapshot (it's maintained live afterwards) and it takes about 15+GB of additional disk space.

As for the ugly part? Well, it took us over 6 months to feel confident enough about the snapshot to ship it, but even now it's behind the --snapshot flag and there's still tuning and polishing to be done around memory usage and crash recovery.

All in all, we're very proud of this improvement. It was an insane amount of work and it was a huge shot in the dark implementing everything and hoping it will work out. Just as a fun fact, the first version of snap sync (leaf sync) was written 2.5 years ago and was blocked ever since because we lacked the necessary acceleration to saturate it.

Epilogue

Hope you enjoyed this first post of Ask about Geth. It took me about twice as much to finish it than I aimed for, but I felt the topic deserves the extra time. See you next week.

[PS: I deliberately didn't link the asking/voting website into this post as I'm sure it's a temporary thing and I don't want to leave broken links for posterity; nor have someone buy the name and host something malicious in the future. You can find it among my Twitter posts.]