



















After handing duckdb-rs over to the DuckDB team in 2023, one question kept coming back to me:
If DuckDB is already great in-process, how do we turn that power into a service that is easier to integrate, deploy, and operate?
SwanLake is my answer to that question.
It is a Rust-based Arrow Flight SQL server, powered by DuckDB, with DuckLake-oriented extensions for datalake scenarios. In practice, SwanLake is built around a three-part combination: DuckDB + DuckLake + Flight SQL.

With duckdb-rs, the primary goal was clear: make DuckDB feel natural in Rust. That part worked well, but new constraints became obvious:
So SwanLake was never “just another wrapper”. I wanted a practical analytics service entrypoint.
You can read SwanLake as a five-layer system:
All query/update traffic enters through Flight SQL. This gives us a protocol that is efficient and language-neutral; the Rust/Go/Python examples in the repo validate this layer directly.
swanlake-core manages connection-scoped sessions:
peer_addr or peer_ip,I did not build a new engine. SwanLake wraps DuckDB for service use: each session has an isolated connection, startup preloads ducklake/httpfs/aws/postgres extensions, and SWANLAKE_DUCKLAKE_INIT_SQL can inject bootstrap SQL.
DuckLake is the key piece. Without it, DuckDB is mainly an excellent local analytical engine. With DuckLake, metadata and object-storage paths can be organized consistently, which makes DuckDB-based datalake services practical.
Runtime metrics (latency/slow query/errors), status endpoints (/ + status.json), and env-based configuration (SWANLAKE_*) form the operational surface. This layer is what makes the system observable and manageable in production.
SwanLake has a built-in status page (default :4215) plus status.json for machine consumption. It exposes:

I added this because these are exactly the signals I want when debugging production behavior.
BENCHMARK.md (CI artifact dated 2026-02-21) includes TPCH results at SF=0.1 where postgres_local_file outperforms postgres_s3 in that run.
| Metric (SF=0.1) | postgres_local_file | postgres_s3 |
|---|---|---|
| Throughput (req/s) | 10.428 | 4.867 |
| Avg latency (ms) | 382.751 | 818.041 |
| p95 latency (ms) | 829.236 | 1904.023 |
| p99 latency (ms) | 1116.002 | 2661.619 |
This is expected directionally: object storage paths usually add more variability.
One practical point is critical here: when using S3 or other remote object storage, you should usually enable cache_httpfs, otherwise latency, especially tail latency, can become very unstable.
This is already reflected in the benchmark workflow configuration. See .github/workflows/performance.yml:
postgres_s3 defaults to BENCHBASE_ENABLE_CACHE_HTTPFS=true,postgres_local_file defaults to BENCHBASE_ENABLE_CACHE_HTTPFS=false,But I do not think the takeaway is “local is always better”. A better takeaway is:
For me, duckdb-rs and SwanLake are part of the same line of work.
duckdb-rs solved: how to use DuckDB elegantly inside Rust applications.
SwanLake solves: how to provide DuckDB as a shared, deployable, operable service for teams.
SwanLake is still evolving. My near-term focus is:
If you used duckdb-rs before, I would love you to try SwanLake and share feedback via issues or PRs.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。