GitHub - tsouza/cerberus: Drop-in Prometheus / Loki / Tempo HTTP gateway for ClickHouse. Translate PromQL, LogQL, and TraceQL into optimized CH SQL

Drop-in Prometheus / Loki / Tempo HTTP gateway for ClickHouse. Keep Grafana, alerting, and your CLI tooling. Swap the backend.

Warning

EXPERIMENTAL — NOT PRODUCTION-READY. Cerberus is in the v1.0.0-rc.* release-candidate stage, early and under active development (see the releases page for the current tag). The differential harnesses run on every PR and score parity against real Prometheus / Loki / Tempo, but correctness, performance, and operational behaviour are still being shaken out, and the surface is evolving. Validate it against your own corpus before pointing anything real at it — do not stand it in for a running Prom / Loki / Tempo deployment without that evaluation — and expect breaking changes. See CHANGELOG.md for what has landed so far.

The three *QL compat badges are differential parity scores — passed / total cases where cerberus matched a reference Prometheus / Loki / Tempo on the same seeded corpus (details). The PromQL leg runs the third-party PromLabs / CNCF PromQL Compliance Tester (prometheus/compliance) — the same tool the CNCF Prometheus Conformance Program uses — at 574/574 cases passing, no allow-list, against a real prom/prometheus. The scores are tracked, not gated: see Compatibility for exactly what the CI checks enforce.

Why cerberus?

Metrics, logs, and traces rarely share a store — the usual answer is Prometheus + Loki + Tempo, three retention policies and storage bills for what is largely the same OTLP data sliced three ways. ClickHouse is a great single store for all three signals; cerberus supplies the missing query side. Point Grafana at it as three datasources and your existing PromQL / LogQL / TraceQL keeps working, translated to ClickHouse SQL underneath.

No Grafana plugin. Cerberus speaks each upstream HTTP API verbatim (/api/v1/query_range, /loki/api/v1/query_range, /api/search, …). Grafana sees three normal datasources.
No custom QL. PromQL, LogQL, TraceQL — exactly as your dashboards and alerts already use them.
No reinvented parsers. Cerberus imports prometheus/promql/parser, grafana/loki/v3/pkg/logql/syntax, and grafana/tempo/pkg/traceql directly. If upstream parses it, cerberus parses it.

Version requirements

Two axes decide whether a deployment is compatible: the ClickHouse server version cerberus queries, and the OTel schema shape the data was written in.

Component	Minimum	Notes
ClickHouse	24.8	The supported floor — the SQL cerberus emits is correct down to it. Enabling the experimental native rate requires 25.6 (below).
OTel exporter schema	clickhouseexporter 0.152.0	A schema shape, not a binary version — see below.

ClickHouse. 24.8 is the lowest version cerberus's emitted SQL is correct on: the 24.8 empty-input / parse-unit / filter-path quirks are all worked around unconditionally, so a query that runs on 24.8 runs on every newer server too. The differential compatibility harnesses — the source of truth for all three heads — execute on ClickHouse 25.8, so the validated SQL is exercised forward of the floor as well. Enabling the experimental native-rate path (CERBERUS_EXPERIMENTAL_TS_GRID_RANGE, default off) raises the floor to 25.6: it lowers eligible rate(<counter>[range]) range queries to the compiled timeSeriesRateToGrid aggregate, which exists only from ClickHouse 25.6. With the flag off, 24.8 is sufficient. See docs/operations.md for the runtime contract and the experimental-setting details.

OTel schema — the shape, not the exporter. Cerberus reads the OpenTelemetry ClickHouse schema shape pinned to clickhouseexporter v0.152.0 (via the tsouza/…:cerberus-ddl fork in go.mod). What matters is the table layout — column names, types, and Map shapes — not which binary produced it. Any exporter, collector pipeline, or ingestion path that writes tables in that shape works; the exporter binary version itself is irrelevant. If your layout deviates from the exporter defaults, point cerberus at it with the CERBERUS_SCHEMA_* overrides — see docs/configuration.md.

Quick start

git clone https://github.com/tsouza/cerberus.git && cd cerberus
docker compose up --wait
open http://localhost:3000   # Grafana (auto-login as admin); cerberus on :8080

That builds cerberus, boots single-node ClickHouse, loads a deterministic OTel fixture (logs / traces / metrics), and brings up Grafana pre-provisioned with cerberus as three datasources. A fresh dashboard populates in ~30s; docker compose down -v wipes the volume.

From a published release

Cerberus is one stateless binary configured via environment variables. Pin an explicit tag — :latest only moves with stable releases:

docker pull ghcr.io/tsouza/cerberus:<tag>
docker run --rm -p 8080:8080 -e CERBERUS_CH_ADDR=clickhouse:9000 \
  ghcr.io/tsouza/cerberus:<tag>

Prebuilt binaries (linux / darwin × amd64 / arm64) are on the release page; each release ships a SLSA build provenance attestation:

gh attestation verify cerberus_*_linux_amd64.tar.gz --owner tsouza --repo cerberus

Cerberus is configured entirely through CERBERUS_* environment variables — see the full configuration reference. The surrounding runtime contract (lifecycle, scaling, the solver and experimental knobs in context) lives in docs/operations.md.

Architecture

Cerberus has one query pipeline, not three. Each head parses with its reference upstream parser and lowers to a shared plan IR (internal/chplan); a rule-based optimiser rewrites it; the closed typed-Frag internal/chsql emitter produces parameterised, escape-free ClickHouse SQL; and the engine streams results. The three HTTP heads plug in as thin Lang adapters over internal/engine, so the optimiser and emitter never know which head produced a plan — new optimisations cost one implementation, not three.

See docs/engine.md for the Lang contract, the request lifecycle, and the per-stage breakdown (IR algebra, optimiser rules, the typed-SQL emitter, the OTel schema). For how cerberus keeps queries fast — the compute-fan-out strategy and per-layer optimisations — see docs/performance.md.

Rate-over-range is exact by default. rate(…) range queries match reference Prometheus bit-for-bit and stay sub-second at realistic scale. For million-row queries an experimental native ClickHouse path (timeSeriesRateToGrid) trades a sub-observable last-bit rounding difference for flat memory and an order-of-magnitude speed-up — see the exactness-vs-scale tradeoff guide.

Compatibility

Each query language has a differential harness: cerberus and a reference engine answer the same corpus against the same seeded data, and the responses are diffed case-for-case — pinning observed semantics on real ClickHouse against an upstream oracle, not just emitted SQL.

The strongest leg is PromQL, which runs the third-party PromQL Compliance Tester (prometheus/compliance, the PromLabs / CNCF Prometheus Conformance Program tooling) against a real prom/prometheus, seeded identically on both sides via remote-write. 574/574 cases pass, no allow-list. LogQL diffs against a real Loki on Grafana's own pkg/logql/bench corpus — solid, but a Grafana bench corpus rather than a standardised conformance suite. TraceQL is the lighter leg: there is no third-party TraceQL conformance suite, so its corpus is cerberus-owned (author-written TXTAR), and its numerical confidence is correspondingly lower than PromQL's.

Head	Reference + corpus	Required check	Conformance leg
PromQL	real `prom/prometheus` vs `prometheus/compliance` (PromLabs / CNCF)	`compatibility/prometheus`	third-party conformance suite (strongest)
LogQL	real Loki vs `grafana/loki:pkg/logql/bench` corpus	`compatibility/loki`	real-backend diff, Grafana bench corpus
TraceQL	real Tempo vs cerberus-owned TXTAR corpus	`compatibility/tempo`	author-written corpus (lightest)

just compat-all          # or compat-promql / compat-logql / compat-traceql

What the required checks enforce. The three compatibility/<head> checks run on every PR and fail on infrastructure breakage (stack won't boot, seed fails, report unparseable). Per-case parity drift is report-only by design (#503): it is recorded in report.json and rendered into the live compat-score.json badge, but does not turn the required check red. The one lane that hard-fails on any parity diff is compatibility/prometheus-forced-route (FAIL_ON_DIFF=1, proving the sharded solver route is byte-identical to reference Prometheus over the whole corpus) — that lane is informational, not a required check. The honest reading: the badges are a continuously re-measured conformance score, not a merge gate on numeric correctness.

No allow-lists — every diff against the reference is a real bug to fix at the source, not an exception to suppress. The full playbook (per-head drivers, local reproduction, rejection parity, the sole pinned upstream-skip-baseline contract) is in docs/compatibility.md.

Testing

Cerberus is tested in a 13-layer map spanning AST-shape pinning, plan-IR invariants, optimiser properties, emitted-SQL goldens, chDB roundtrips, function-surface parity, HTTP wire conformance, differential harnesses, Playwright UX flows, deterministic chaos / goleak, perf benchmarks + compute-fan-out guards, live-stack chaos against the k3d deployment, and an oracle-based property framework. just test runs the core lanes; see docs/test-strategy.md for the canonical layer map, the CI-gate inventory, and the gremlins rollout.

Documentation

Doc	What's in it
`docs/engine.md`	The shared query pipeline, the `Lang` contract, and the per-stage breakdown.
`docs/coverage.md`	Per-function / per-construct support status across PromQL / LogQL / TraceQL.
`docs/configuration.md`	The full `CERBERUS_*` environment-variable reference, grouped by area, with types and defaults.
`docs/operations.md`	Runtime contract: lifecycle, scaling, the solver and experimental knobs in context.
`docs/performance.md`	The compute-fan-out strategy, per-layer optimisations, and how they're held against regression.
`docs/solver.md`	The sharded-pushdown solver: eligibility, slicing, execution, and the cancellation contract.
`docs/benchmarks.md`	Benchmark methodology and the recorded numbers (regenerable).
`docs/compatibility.md`	The differential-harness playbook for all three heads.
`docs/test-strategy.md`	The 13-layer test map and CI-gate inventory.
`docs/observability.md`	Self-observability across logs / metrics / traces (OTLP export).
`docs/health.md`	`/readyz` / `/healthz` probe semantics.
`docs/upstream-forks.md`	The `tsouza/*` parser-fork + Dependabot-watch flow.
`docs/forbid-skip.md`	The forbidden-pattern reference for the `forbid-skip` gate.

Contributing

Smaller PRs (a new optimizer rule, a TXTAR fixture, a parser-dep bump) are welcome any time; open an issue or discussion before a large one. The local-dev and end-to-end commands live in CONTRIBUTING.md.

推荐订阅源

Hacker News: Show HN