惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

SecWiki News
SecWiki News
I
InfoQ
The Cloudflare Blog
人人都是产品经理
人人都是产品经理
博客园 - Franky
T
Tailwind CSS Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
量子位
博客园_首页
罗磊的独立博客
V
V2EX
李成银的技术随笔
大猫的无限游戏
大猫的无限游戏
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
True Tiger Recordings
Vercel News
Vercel News
Cyberwarzone
Cyberwarzone
Cisco Talos Blog
Cisco Talos Blog
F
Fox-IT International blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
M
Microsoft Research Blog - Microsoft Research
Know Your Adversary
Know Your Adversary
爱范儿
爱范儿
The Register - Security
The Register - Security
G
Google Developers Blog
The Hacker News
The Hacker News
Malwarebytes
Malwarebytes
S
Securelist
博客园 - 三生石上(FineUI控件)
Jina AI
Jina AI
T
Threat Research - Cisco Blogs
T
The Exploit Database - CXSecurity.com
S
SegmentFault 最新的问题
博客园 - 叶小钗
F
Fortinet All Blogs
Apple Machine Learning Research
Apple Machine Learning Research
宝玉的分享
宝玉的分享
博客园 - 聂微东
T
Threatpost
博客园 - 【当耐特】
D
Docker
P
Privacy & Cybersecurity Law Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
G
GRAHAM CLULEY
V
Visual Studio Blog
C
Cisco Blogs
IT之家
IT之家
S
Security Archives - TechRepublic
Latest news
Latest news
阮一峰的网络日志
阮一峰的网络日志

Wang Fenjin

SwanLake:一个基于 DuckDB + DuckLake 的 Arrow Flight SQL 数据湖服务 2026 年的软件开发流程,会被 AI 改成什么样? 感恩字节 duckdb-rs will be the offical DuckDB rust client duckdb-rs 即将成为 DuckDB 官方 rust 客户端 基于 apache-arrow 的 duckdb rust 客户端 Simple: SQLite3 结巴分词插件 xeus-clickhouse: Jupyter 的 ClickHouse 内核 用 od 查看 ClickHouse 的索引文件 Spacemacs Intro Simple: 一个支持中文和拼音搜索的 sqlite fts5插件 About Showcase
SwanLake: An Arrow Flight SQL Datalake Service Built on DuckDB + DuckLake
2026-02-21 · via Wang Fenjin

After handing duckdb-rs over to the DuckDB team in 2023, one question kept coming back to me:

If DuckDB is already great in-process, how do we turn that power into a service that is easier to integrate, deploy, and operate?

SwanLake is my answer to that question.

It is a Rust-based Arrow Flight SQL server, powered by DuckDB, with DuckLake-oriented extensions for datalake scenarios. In practice, SwanLake is built around a three-part combination: DuckDB + DuckLake + Flight SQL.

SwanLake project overview

Why I started SwanLake

With duckdb-rs, the primary goal was clear: make DuckDB feel natural in Rust. That part worked well, but new constraints became obvious:

  1. Most teams are not single-language; they need one service interface across stacks.
  2. Real workloads involve object storage, metadata services, and multiple cooperating systems.
  3. Production systems need observability, not just logs.

So SwanLake was never “just another wrapper”. I wanted a practical analytics service entrypoint.

Architecture

You can read SwanLake as a five-layer system:

1) Access Layer: Arrow Flight SQL (gRPC)

All query/update traffic enters through Flight SQL. This gives us a protocol that is efficient and language-neutral; the Rust/Go/Python examples in the repo validate this layer directly.

2) Session Layer: Session Registry

swanlake-core manages connection-scoped sessions:

  1. session IDs are created/reused from peer_addr or peer_ip,
  2. prepared statements, transactions, and temp objects remain session-affine,
  3. max sessions + idle timeout protect server resources.

3) Execution Layer: DuckDB

I did not build a new engine. SwanLake wraps DuckDB for service use: each session has an isolated connection, startup preloads ducklake/httpfs/aws/postgres extensions, and SWANLAKE_DUCKLAKE_INIT_SQL can inject bootstrap SQL.

4) Datalake Layer: DuckLake

DuckLake is the key piece. Without it, DuckDB is mainly an excellent local analytical engine. With DuckLake, metadata and object-storage paths can be organized consistently, which makes DuckDB-based datalake services practical.

5) Operations Layer: Metrics + Status + Config

Runtime metrics (latency/slow query/errors), status endpoints (/ + status.json), and env-based configuration (SWANLAKE_*) form the operational surface. This layer is what makes the system observable and manageable in production.

Observability was a first-class requirement

SwanLake has a built-in status page (default :4215) plus status.json for machine consumption. It exposes:

  1. session counts and idle indicators,
  2. query/update latency stats (avg, p95, p99),
  3. slow query and recent error history.

SwanLake status page

I added this because these are exactly the signals I want when debugging production behavior.

How I read the current benchmark data

BENCHMARK.md (CI artifact dated 2026-02-21) includes TPCH results at SF=0.1 where postgres_local_file outperforms postgres_s3 in that run.

Metric (SF=0.1)postgres_local_filepostgres_s3
Throughput (req/s)10.4284.867
Avg latency (ms)382.751818.041
p95 latency (ms)829.2361904.023
p99 latency (ms)1116.0022661.619

This is expected directionally: object storage paths usually add more variability.

One practical point is critical here: when using S3 or other remote object storage, you should usually enable cache_httpfs, otherwise latency, especially tail latency, can become very unstable.

This is already reflected in the benchmark workflow configuration. See .github/workflows/performance.yml:

  1. postgres_s3 defaults to BENCHBASE_ENABLE_CACHE_HTTPFS=true,
  2. postgres_local_file defaults to BENCHBASE_ENABLE_CACHE_HTTPFS=false,
  3. the workflow input can override this behavior.

But I do not think the takeaway is “local is always better”. A better takeaway is:

  1. choose storage tiers based on workload shape,
  2. run repeated benchmarks and track variance,
  3. keep performance visibility continuous, not one-off.

From duckdb-rs to SwanLake

For me, duckdb-rs and SwanLake are part of the same line of work.

duckdb-rs solved: how to use DuckDB elegantly inside Rust applications.

SwanLake solves: how to provide DuckDB as a shared, deployable, operable service for teams.

What I will keep working on

SwanLake is still evolving. My near-term focus is:

  1. more production-oriented reliability and load testing,
  2. better performance predictability on object storage backends,
  3. a more consistent developer experience across server and clients.

If you used duckdb-rs before, I would love you to try SwanLake and share feedback via issues or PRs.

References