惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

Lobsters

EYG: Host of CLI improvements, new guides and new effects The Eternal Sloptember JS Crossword C array types are weird; and related topics Flatpak will depend on systemd – OSnews Migrating from Go to Rust | corrode Rust Consulting Building Pi With Pi abyss * your_dotfiles_are_not_a_distro Vivado Licensing Options How my minimal, memory-safe Go rsync steers clear of vulnerabilities From AFSK to Goertzel the entropy layer of a wavelet codec, on its own 10,000 Lines Later: When a Tool Became a Compiler - Rob Durst - Gleam Gathering 2026 Debian SE Linux and PinTheft fht-compositor: A dynamic tiling Wayland compositor A Network Allow-List Won't Stop Exfiltration — André Graf Does bulk memmove speed up std::remove_if? (No.) What is Git made of? wake up! 16b 声明式部分更新 | Blog | Chrome for Developers Don't Roll Your Own ... Dianne Skoll's Web Site - Remind “Long-Term Support” doesn’t mean what you think The Architecture of Open Source Applications (Volume 1)Berkeley DB Pardon MIE? - ironPeak Blog seriot.ch It's time to talk about my writerdeck hershey Cuneiforth: A Forth for your Chifir z386: An Open-Source 80386 Built Around Original Microcode waylandcraft - Minecraft Mod On the <dl> HP QuickWeb, Singular And Pointless mvm - a fast virtual machine for Go That one time I used Go panics for flow control A new suite of modern tools coming for editing and publishing RFCs From the Tabletop… The Digital Antiquarian .NET (OK, C#) finally gets union types🎉: Exploring the .NET 11 preview - Part 2 Revised^7 Report on Scheme, Large: Procedural Fascicle Draft is now public The Soul of Maintaining a New Machine - Third Draft | Books in Progress
Scaling Akvorado BMP RIB with sharding
vincent.bern · 2026-05-25 · via Lobsters

To associate routing information—like AS paths or BGP communities—to flows, Akvorado can import routes through the BGP Monitoring Protocol (BMP). As the Internet routing table contains more than 1 million routes, Akvorado needs to scale to tens of millions of routes.1 This has been a long-standing challenge,2 but I expect this issue is now fixed by using RIB sharding, a method that splits the routing database into several parts to enable concurrent updates.

Previous implementation#

Akvorado connects 2 elements to build its RIB:

  1. a prefix tree, and
  2. a list of routes attached to each prefix.

Akvorado BMP RIB implementation before sharding with the memory layout of each
structure and a single lock.

Akvorado BMP RIB implementation without sharding. One single read/write lock.

In the diagram above, the RIB stores five IPv4 prefixes and two IPv6 prefixes. One of them, 2001:db8:1::/48, contains three routes:

  • from peer 3, next hop 2001:db8::3:1, AS 65402, AS path 65402, community 65402:31,
  • from peer 4, next hop 2001:db8::4:1, same ASN, AS path, and community,
  • from peer 5, next hop 2001:db8::5:1, AS 65402, AS path 65401 65402, community 65402:31.

The rib structure is defined in Go as follows:

type rib struct {
    tree          *bart.Table[prefixIndex]
    routes        map[routeKey]route
    nlris         *intern.Pool[nlri]
    nextHops      *intern.Pool[nextHop]
    rtas          *intern.Pool[routeAttributes]
    nextPrefixID  prefixIndex
    freePrefixIDs []prefixIndex
}

The prefix tree uses the bart package, an adaptation of Donald Knuth’s ART algorithm. The benchmarks demonstrate it outperforms other packages for lookups, insertions, and memory usage.3 Plus, the author is quite helpful.

Storing routes in a map#

The list of routes for each prefix is not stored directly in the prefix tree: it would put too much pressure on the garbage collector by allocating per-prefix arrays.

Instead, the RIB assigns a unique 32-bit prefix identifier for each prefix, either by picking the last available prefix identifier from the freePrefixIDs array if any, or using the nextPrefixID value before incrementing it. Then, the routes are stored in the routes map, leveraging the optimized Swiss table in Go. To retrieve routes attached to a prefix, we look them up one by one in the routes map with a 64-bit key combining the 32-bit prefix index with a 32-bit route index matching the position of the route in the list. Akvorado scans routes from the first to the last to find the best one.4 It knows there is no more route if the route key returns no result.

type prefixIndex uint32
type routeIndex uint32
type routeKey uint64

Interning routes#

A route contains a BGP peer identifier, a partial NLRI5, the next hop, and the attributes.

type route struct {
    peer       uint32
    nlri       intern.Reference[nlri]
    nextHop    intern.Reference[nextHop]
    attributes intern.Reference[routeAttributes]
    prefixLen  uint8
}

type nlri struct {
    family bgp.Family
    path   uint32
    rd     RD
}
type nextHop netip.Addr
type routeAttributes struct {
    asn              uint32
    asPath           []uint32
    communities      []uint32
    largeCommunities []bgp.LargeCommunity
}

To save memory and allocations, NLRI, next hops, and route attributes are “interned:” a 32-bit integer replaces the real value. The mechanism predates the unique package introduced in Go 1.23. We keep it because it has different trade-offs:

  • It uses explicit reference counting instead of relying on weak pointers.
  • It works with non-comparable values implementing Hash() and Equal() methods.6
  • It uses explicit pool instances. This will be useful for sharding.
  • It has better performance. See for example this benchmark.
  • It consumes half the memory thanks to unsigned 32-bit references instead of pointers.
  • But it is not safe for concurrent use.

Why does it not scale?#

The global read/write lock is a bottleneck in this implementation. But how? There are several users of the RIB, each with its own set of constraints:

  • The Kafka workers look up the RIB to enrich flows with routing information. They are bound by the number of Kafka partitions.8 Akvorado also adjusts their number to ensure efficient batching to ClickHouse. On our setup, the number of workers oscillates between 8 and 16. As we want to observe the latest data, we cannot afford for the Kafka workers to lag too much.

  • The monitored routers send route updates through the BMP protocol. When connecting, they can send millions of routes.9 After the initial synchronization, updates are sent continuously and may spike from time to time. The router detects a stuck BMP station when its TCP window is full and resets the session in this case. While Akvorado implements a large incoming buffer, it still needs to update the received routes with the write lock held fast enough to avoid being detected as stuck.

  • When a remote BGP peer goes down, Akvorado flushes the associated routes by walking the RIB with the write lock held. When a monitored router goes down, Akvorado waits a bit but eventually flushes all the associated routes.

In short: on a busy setup, lock contention is high for both readers and writers, and neither can lag too much behind.

RIB sharding#

First step: basic sharding#

To remove the global lock, the RIB is split into several “shards,” each one handling a subset of the prefixes:

Akvorado BMP RIB implementation after sharding with the memory layout of each
structure and one lock per shard.

Akvorado BMP RIB implementation with sharding.

The prefix tree stays global and is protected by a single lock. Each shard gets its read/write lock, its route map, and its intern pools to store NLRIs, next hops, and route attributes, which would not have been possible with Go’s unique package. The prefix indexes are also sharded: the 8 most significant bits are the shard index and the 24 remaining bits are the local prefix index.

Gerhard confirmed that after this blind change, the BMP receiver chugged steadily. 🎉

Later, I wrote a concurrent benchmark over half a million synthetic but plausible routes10 partitioned over 0 to 8 writers, churning routes as fast as possible, while 1 to 16 readers continuously look up a set of 10,000 routes. I don’t know if this benchmark is realistic, but it confirms the improvements for both read and write latencies:

Two heatmaps. One for read latency ratio, the other for write latency ratio.
Both of them comparing the speedup with colored tiles between the code before
sharding and after sharding. Most tiles are
green.

Read and write latency performance improvement after sharding.

It also shows that a high number of writers degrades read latency.

Second step: lock-free reads#

The single read/write lock protecting the prefix tree is the next target. The bart package provides alternative mutation methods returning an updated tree using copy-on-write. Readers don’t need the global lock any more, leaving it only to synchronize writers. The prefix tree is boxed in an atomic pointer.

Akvorado BMP RIB implementation for sharding with lock-free reads. It shows
the memory layout of each structure.

Akvorado BMP RIB implementation with sharding and lock-free reads.

Without a lock, readers can now fetch a stale prefix index when walking their copy of the tree if a concurrent writer removes the last route attached to this prefix index and recycles it for another prefix. To avoid this issue, we combine the prefix index with a generation number and store them in the tree:

type generation uint32
type prefixRef struct {
    idx prefixIndex
    gen generation
}
type rib struct {
    mu     sync.Mutex
    tree   atomic.Pointer[bart.Table[prefixRef]]
    shards []*ribShard
}

Each shard stores the generation number for each local prefix index. The generation number increases by one if the associated prefix index is freed. When looking up the routes attached to a prefix index, the reader checks if the generation number matches. Otherwise, it assumes the index was recycled and the list of routes is empty.11 You can see this case in the diagram above for prefix index 5, stored with a generation index of 3, while the current value in the []generations array is 4. The generation number could overflow, but it is not a problem as lookups are quick.

Running the concurrent benchmark against this new implementation shows the improvements for the read latency as soon as the cost of the copy-on-write prefix tree is amortized.

Six heatmaps. Three for read latency ratio, three others for write latency
ratio. They compare the numbers without sharding, with sharding, and with
lock-free reads, pair by pair. For read latency, most tiles are green, showing
an improvement of the second step. For write latency, the speedup is negative
for a low number of readers.

Read and write latency performance improvement after lock-free reads. The middle column shows the cumulative improvements of both steps.

Among the multiple attempts to optimize the BMP component, RIB sharding is one of the more satisfying. Akvorado 2.2 implements the first step. PR #2433, drafted while writing this blog post, implements the second step and will be released with Akvorado 2.4. 🪓