GitHub - Rocketgraph/rocketgraph: Agent layer for observability

Self-hosted log clustering and streaming anomaly detection that drops in next to the observability stack you already run.

What's in here • Quick start • Examples • Website • Community

Why?

Your monitoring tool tells you what you searched for. It rarely tells you what's unusual right now.

Rocketgraph sits next to whatever you already pay for — Datadog, New Relic, Loki, CloudWatch, Sentry, ClickHouse — pulls a window of logs, mines structural templates, and flags the anomalous ones. It runs entirely inside your network. Your logs never leave your VPC. There's no SaaS tier to pay for.

What's in here

Component	What it does
🧠 ML engine	Clusters logs into structural templates and detects anomalies. Pulls directly from your existing log source — no parallel ingest pipeline.
⚡ `@rgraph/otel-node`	AI agent that auto-instruments any Node.js service with OpenTelemetry in ~90 seconds.

Try it in 90 seconds

git clone https://github.com/Rocketgraph/rocketgraph
cd rocketgraph/ml
cp .env.example .env             # fill in whichever sources you have
docker compose up --build        # → http://localhost:9020

Point it at any source you already use:

curl 'http://localhost:9020/clusters?source=loki&window=1h'

Or skip the credentials entirely — download a log file and run it. Export from Datadog (CSV/JSON), kubectl logs > app.log, or any raw log, drop it in, and analyse it locally:

curl -XPOST 'http://localhost:9020/clusters/train?source=file'   # FILE_PATH=/data/app.log

See the one-command log-file quickstart.

That's the whole install. No schemas to provision, no accounts to create, no agents on hosts.

👉 Deep dive: ml/README.md for the ML engine · packages/otel-node for the OTel agent

How it works (30-second version)

Three deterministic algorithms in sequence — no LLM, no hallucination, fully reproducible:

Drain3 mines structural templates from raw log lines.
Isolation Forest scores templates per service to surface the unusual ones.
Half-Space-Trees scores brand-new logs against the trained model in real time.

On a real production burst we test against: 2M logs → 58 templates → 9 anomalies, 90 seconds wall-clock, single container. Full details in ml/README.md.

Examples

Analyse a log file locally — `analyze.py`

The fastest way to see Rocketgraph work: drop a log file in ./logs/, run one command, and get a cluster table with the anomalies flagged. No accounts, no API keys, nothing leaves your machine. Add --ai for an optional Claude triage on top — the engine itself stays deliberately LLM-free and reproducible; the model only explains the deterministic clusters.

cd example-setups/logfile-quickstart

docker compose up --build -d            # ML engine on http://localhost:9020
python gen_sample_log.py                # or: cp ~/Downloads/whatever.log ./logs/file.log
pip install requests                    # anthropic too, if you'll use --ai

python analyze.py                       # table of all clusters
python analyze.py --anomalies-only      # just the flagged ones
python analyze.py --ai                  # table + AI triage
python analyze.py mylogs.log --ai       # a specific file

analyze.py auto-detects the file, points the engine at it, pulls the clusters, and prints them. ~15,000 raw lines collapse to ~11 structural templates; the brand-new "database failover" template — 8 lines, never seen before, error level — comes back flagged as an anomaly. No rules written, no labels:

15188 logs → 11 clusters (3 anomalous)

  ANOM SERVICE        LOGS DEPTH  TEMPLATE
  ----------------------------------------
   *   payment-svc       8     3  Database failover: replica <*> promoted to primary after ...
   *   auth-svc       1573     2  Token refreshed for session <NUM>
       payment-svc    1686        Charge <NUM> authorized for $<FLOAT>
       ...

Reading the table: ANOM marks the clusters Isolation Forest flagged; LOGS is how many raw lines collapsed into that template; DEPTH is the isolation depth on anomalous clusters (lower = more anomalous); TEMPLATE is the structural pattern Drain3 mined. The flagged failover cluster is rare and new, which is exactly what surfaces it.

With --ai, the same clusters are handed to Claude for an SRE-style triage — likely incident, ranked root-cause hypotheses, and concrete next steps — grounded only in the clusters above. Full walkthrough in the log-file quickstart.

End-to-end reference apps

example-setups/ also contains reference apps you can point otel-node at to see the whole pipeline working — instrument the service, ship OTLP into your sink, then watch Rocketgraph cluster and flag the logs.

Example	What it shows
`bookstore-app`	Express + TypeScript service auto-instrumented by `@rgraph/otel-node` — the easiest way to see traces, metrics, and logs flowing into Rocketgraph end-to-end.

More examples (Fastify, NestJS, Next.js) are on the roadmap — PRs welcome.

Compatibility

Status	Platforms
✅ Supported	Log file (`.log`/`.json`/`.csv`) · OpenTelemetry · Loki · New Relic · Datadog · CloudWatch · Sentry · ClickHouse
🛣️ Roadmap	Splunk · Elastic / OpenSearch · Azure Monitor · GCP Cloud Logging

Community

💬 Discord — support and design discussions
🐛 GitHub Issues — bugs and feature requests
🐦 @RGraphql — release notes

Contributing

PRs welcome. The most impactful contributions right now:

New ML connectors (Splunk, OpenSearch, Azure Monitor, GCP Cloud Logging)
Additional framework support in @rgraph/otel-node (Fastify, NestJS, Remix, Bun-native services)
More end-to-end reference apps under example-setups/

See ml/README.md and packages/otel-node for the deep-dive docs.

License

Apache 2.0. See LICENSE.

Self-hosted. Open source. Drops in next to what you already run.
rocketgraph.app

推荐订阅源

Hacker News - Newest: "LLM"

Why?

What's in here

Try it in 90 seconds

How it works (30-second version)

Examples

Analyse a log file locally — `analyze.py`

End-to-end reference apps

Compatibility

Community

Contributing

License

推荐订阅源

Hacker News - Newest: "LLM"

Why?

What's in here

Try it in 90 seconds

How it works (30-second version)

Examples

Analyse a log file locally — analyze.py

End-to-end reference apps

Compatibility

Community

Contributing

License

Analyse a log file locally — `analyze.py`