惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
Netflix TechBlog - Medium
雷峰网
雷峰网
The Cloudflare Blog
博客园 - 叶小钗
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
月光博客
月光博客
美团技术团队
J
Java Code Geeks
S
SegmentFault 最新的问题
罗磊的独立博客
WordPress大学
WordPress大学
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
腾讯CDC
博客园 - 三生石上(FineUI控件)
V
Visual Studio Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - 司徒正美
T
Tailwind CSS Blog
宝玉的分享
宝玉的分享
博客园 - 聂微东
Apple Machine Learning Research
Apple Machine Learning Research
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - Franky
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
V
V2EX
aimingoo的专栏
aimingoo的专栏
M
MIT News - Artificial intelligence
B
Blog RSS Feed
Martin Fowler
Martin Fowler
酷 壳 – CoolShell
酷 壳 – CoolShell
博客园 - 【当耐特】
D
Docker
爱范儿
爱范儿
云风的 BLOG
云风的 BLOG
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
C
Check Point Blog
博客园_首页
Vercel News
Vercel News
量子位
有赞技术团队
有赞技术团队
Google DeepMind News
Google DeepMind News
IT之家
IT之家
阮一峰的网络日志
阮一峰的网络日志
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Last Week in AI
Last Week in AI
The Register - Security
The Register - Security
G
Google Developers Blog
Hugging Face - Blog
Hugging Face - Blog

Hacker News: Show HN

PurrrrrFocus: Pomodoro Timer App - App Store Workflow Engine — Multi-Step Orchestration for Bun RapidPhoto: Pro Photo Editor App - App Store GitHub - DheerG/swarms: Achieve extraordinary results with claude code across a variety of tasks SPICE simulation → oscilloscope → verification with Claude Code — Lucas Gerads Show HN: VCoding – A 5 MB native Windows IDE with no dynamic dependencies Show HN: LLMs don't hallucinate because they're bad at math, it's the format GitHub - Agent-FM/agentfm-core: AgentFM is a peer-to-peer network that turns everyday computers into a decentralized AI supercomputer. AgentFM lets you run massive AI workloads directly across a global mesh of idle CPUs and GPUs. Show HN: Tracking Top US Science Olympiad Alumni over Last 25 Years GitHub - Potarix/agent-hub: One place to talk to all your agents Show HN: Runtime security for AI agents(injection,tool abuse, data exfiltration) GitHub - dubeyKartikay/lazyspotify: Terminal Spotify client for macOS and Linux GitHub - the-banana-tool/king-louie: Easy to use GUI Personal AI Assistant. Win/Linux/Mac. Show HN I made my vacation rental bookable by AI agents–no Airbnb, 0% commission GitHub - basteez/jsf-autoreload: maven plugin to enable hot reload on jsf projects uvm32/hosts/host-gdbstub at main · ringtailsoftware/uvm32 GitHub - labsai/EDDI: Config-driven engine that turns JSON into production-grade AI agents. Multi-agent orchestration, 12+ LLM providers, MCP/A2A protocols, RAG, persistent memory, and enterprise compliance (EU AI Act, GDPR, HIPAA). Built on Quarkus. GitHub - glitchnsec/fortyone-oss: AI Executive Assistant Platform Quickstart | Alien GitHub - muxshed/shed: One stream in, or many. Every destination, simultaneously. No cloud middleman, no per-channel fees, no limits. GitHub - ocrbase-hq/ocrbase: 📄 PDF/IMG ->.MD/JSON Document OCR API for PaddleOCR and GLMOCR. Self-hostable. GitHub - impactjo/home-memory: MCP server that lets your AI assistant remember everything about your home. GitHub - Sets88/dbcls: DbCls is a powerful terminal database client that supports various databases GitHub - neptun2000/heor-agent-mcp GitHub - SeanFDZ/macmind: Single-layer transformer in HyperTalk for the classic Macintosh RollQuation: Math Puzzles - Apps on Google Play GitHub - dropbox/witchcraft Show HN: Agent-cache – Multi-tier LLM/tool/session caching for Valkey and Redis GitHub - opentalon/opentalon: OpenTalon is an open-source platform built from the ground up in Go as a robust alternative to OpenClaw LinkedIn™ 职位抓取工具 - Chrome 应用商店 GitHub - EdoardoBambini/Agent-Armor-Iaga: AI agents are getting tool access — shell, file system, databases, APIs, secrets. But **nobody is governing what they actually do with it**. Frameworks like LangChain, CrewAI, AutoGen, and Claude Code give agents the power to execute. Agent Armor gives you the power to control, audit, and approve every single action before it happens. HN Vibes — Week 15, Apr 7–13 2026 GitHub - chojs23/ec: Easy terminal-native 3-way git mergetool vim-like workflow GitHub - SethPyle376/hiraeth: Local AWS emulator focused on fast integration testing, with SQS support, SQLite-backed state, and a debug-friendly web UI. GitHub - JakOb-dotcom/cloud-sandbox-security-analysis: Technical analysis and Proof of Concept (PoC) regarding environment variable exfiltration in containerized cloud sandboxes via side-channel data leaks. Springboards - Flint Alpha Show HN: A simpler coding agent harness GitHub - audiodude/sudomake-friends GitHub - 256thFission/mini-mythos: OSS clone of Anthropic’s Mythos harness to locate C/C++ memory vulnerabilities Show HN: OpenParallax: OS-level privilege separation for AI agent execution Hacker News Sorted - Chrome 应用商店 Show HN: How to Install Docker on Ubuntu 24.04 LTS: Complete 2026 Guide GitHub - himanshudongre/smriti GitHub - sverrirsig/claude-control: macOS desktop dashboard for monitoring and managing multiple Claude Code sessions GitHub - ory/dockertest: Write better integration tests! Dockertest helps you boot up ephermal docker images for your Go tests with minimal work. Chiral - Chrome 应用商店 Show HN: Two Claudes collaborating through shared memory on a $100 mini-PC GitHub - pmichaillat/latex-cv: Minimalist LaTeX template for academic CVs GitHub - oguzbilgic/posse: A web UI for Anthropic Managed Agents. GitHub - sshiraz/depsly: Dependency risk analysis tool for npm packages ABI Add safari/agent-harness — Safari browser automation via safari-mcp by achiya-automation · Pull Request #212 · HKUDS/CLI-Anything GitHub - Halfblood-Prince/trustcheck: Verify PyPI package attestations and improve Python supply-chain security GitHub - oguzbilgic/kern-ai: Agents that do the work and show it. GitHub - bruits/satteri: High-performance Markdown and MDX processing for the JavaScript ecosystem GitHub - tylergibbs1/feedstock: High-performance web crawler and scraper for TypeScript, powered by Bun and Playwright GitHub - Grimm67123/grimmbot: The self-improving sandboxed and open-source AI agent. With persistent memory and scheduling. GitHub - whitevanillaskies/whitebloom: Local whiteboard that blooms. GitHub - hwdsl2/docker-whisper: Docker image for a self-hosted Whisper speech-to-text server with speaker diarization and OpenAI-compatible transcription and translation APIs. Powered by faster-whisper. Supports all Whisper models, NVIDIA GPU (CUDA) acceleration, JSON/SRT/VTT output, SSE streaming, offline mode, and multi-arch (amd64, arm64). GitHub - yisding/reviewwiggum GitHub - MarwanAlsoltany/serrors: Structured errors for Go: sentinel hierarchies, typed data, custom formatting, and slog integration. GitHub - soatok/age-php GitHub - Luthiraa/markitme GitHub - stagas/rtdiff: realtime git diff gui and AI-assisted commits GitHub - tombedor/excalicharts GitHub - wh1le/excalidraw-edit: Open and edit .excalidraw files from the terminal. Offline, auto-saves to disk. MalExt Sentry - Malicious Extension Scanner - Chrome 应用商店 GitHub - syi0808/asciianimesvg: Generate animated ASCII art SVGs from text. CLI, Rust library, WASM, and web editor. GitHub - zaina-ml/ml_forge: A visual-based graph node editor for training computer vision models. GitHub - anakin87/llm-rl-environments-lil-course: 🌱 A little course on Reinforcement Learning Environments for evaluating and training Language Models GitHub - takaakit/superpowers-uml: Superpowers-UML modifies Superpowers to ensure a software development workflow in which AI agents design through UML modeling. AdriByte Studio - Sviluppo Web e Soluzioni Digitali GitHub - chouligi/angel-copilot: Your personalized Angel Investment Advisor Show HN: MoodSense AI (ML and FastAPI and Gradio, Deployed on Hugging Face) Moodsense Ai - a Hugging Face Space by aman179102 GitHub - agenteractai/lodmem: Level Of Detail Context Management for Agents GitHub - ostefani/subnetlens: A fast, concurrent network scanner with a TUI and plain-text CLI, built in Go. It discovers live hosts on your network, scans their open ports, resolves hostnames, and fingerprints operating systems—delivered. Cyber Pulse: Agentic Intel - Apps on Google Play Whisper API: Self-Hostable Speech to Text Transcription The Agent-Web Protocol Stack: A Research Thesis GitHub - msmarkgu/RelayFreeLLM: A restful API designed to route user prompts to various AI model providers. Show HN: Provepy – A Python decorator that proves your code using Lean and LLMs Show HN: Pardonned.com – A searchable database of US Pardons GitHub - patrickdappollonio/dux: Dux is a terminal UI that lets you run multiple AI coding agents side by side, each in its own git worktree, with full companion terminals, macros, commit generation, and a command palette that knows more tricks than you do. kMC Crystal Simulator Show HN: HyperFlow – A self-improving agent framework built on LangGraph GitHub - stef41/vibescore: 🎵 Grade your vibe-coded project. One command, instant letter grade across security, quality, dependencies, and testing. GitHub - stef41/lmscan: 🔍 Detect AI-generated text and fingerprint which LLM wrote it. Open-source GPTZero alternative. Zero dependencies, works offline. imgur.com GitHub - visionscaper/collabmem: Enabling long-term collaboration with Agentic AI - building up episodic and world model memory over time with in-context awareness 在 Steam 上购买 FriedrichAI: Offline AI 立省 10% GitHub - atripati/ark: AI Runtime Kernel — a context operating system for AI agents. Eliminates tool bloat, loads only what’s needed, and gives LLMs their reasoning space back. GitHub - nowork-studio/toprank: Open-source Claude Code skills for SEO, SEM, Google Ads GitHub - tacomanator/sash: Lightweight macOS menu bar app for reliably cycling through windows of the current application. Appents | Social Media Management for Product-First Teams GitHub - pnhoang/youtube-spam-blocker: Automatically detects and hides spam messages in YouTube Live chat. Set rate limits, keyword filters, and block repeat offenders. GitHub - decisionnode/DecisionNode: CLI + Local MCP - A shared structured memory store across Claude Code, Cursor, Windsurf, Antigravity, and every MCP client. Semantically queryable. GitHub - AvaCodeSolutions/django-email-learning: An open source Django app for creating email-based learning platforms with IMAP integration and React frontend components. The $100K Gap in Kubernetes Security Tooling Function Calling Harness: From 6.75% to 100%
GitHub - sturnus-dev/sturnus: An OpenAI-compatible LLM proxy that flocks toward the fastest provider
dannyboland · 2026-06-22 · via Hacker News: Show HN

License: MIT GitHub Release Docker Image

Automatic latency-based routing across LLM providers. A single static binary, zero infrastructure.

LLM providers have variable latency and availability that can break production features. sturnus is a lightweight sidecar that sits beside your app, exposes an OpenAI-compatible API, and automatically shifts traffic to whichever provider is fastest and available right now.

Quick start

sturnus needs a config.toml — copy config.example.toml and add your providers.

Docker — best for production deployments and Kubernetes sidecars:

docker run -v ./config.toml:/config.toml \
  -p 4000:4000 \
  ghcr.io/sturnus-dev/sturnus:latest

cargo install — best for local testing if you have a Rust toolchain:

cargo install sturnus
sturnus --config config.toml

Prebuilt static binaries for Linux and macOS (x86_64 and aarch64) are attached to every release.

Then point any OpenAI-compatible SDK at sturnus — the only change is the base URL:

- client = OpenAI(base_url="https://api.openai.com/v1", api_key="sk-...")
+ client = OpenAI(base_url="http://127.0.0.1:4000/v1", api_key="unused")
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:4000/v1", api_key="unused")
response = client.chat.completions.create(
    model="fast",  # resolved by sturnus to the fastest available candidate
    messages=[{"role": "user", "content": "Hello"}],
)

Features

  • Latency- and error-aware routing — the fastest healthy provider gets the bulk of traffic, while slower or erroring ones keep a small, shrinking share. That share doubles as a probe, so a recovered provider wins its traffic back automatically, with no thresholds to trip.
  • Session affinity — a stateless x-session-affinity header pins follow-up requests to the same provider across pods.
  • Transparent passthrough — only the model field is rewritten: the request body is otherwise forwarded byte-for-byte, preserving key order, number precision, and formatting. Responses, including SSE text/event-stream chunks, are relayed untouched as they arrive.
  • Memory-bounded — request buffers are capped per request and in aggregate; bursts beyond the memory budget shed load with 429 + Retry-After instead of OOMing the pod.
  • Vertex AI support — GKE Workload Identity auth via the metadata server, with automatic token refresh.
  • Zero infrastructure — a single static binary; no Redis, database, or control plane.

Why sturnus

Most LLM gateways are either a hosted SaaS you route all your traffic (and keys) through, or a large application with a significant surface area. sturnus is the opposite — a single static binary with a small auditable surface area, MIT-licensed and running entirely inside your infrastructure. It speaks the OpenAI API, so any OpenAI-compatible SDK works by changing one base URL. The core capability of sturnus is automatic latency-based routing across providers — something that most gateways put behind an enterprise tier. Each sidecar routes independently from what it observes locally, so there is no shared state to run.

If you need a full LLMOps platform (spend tracking, prompt management, a UI, dozens of integrations), sturnus is not that.

Design choices & deliberate omissions

sturnus has a bounded scope by design and has some deliberate omissions:

  • No request-level failover or retries. sturnus is a transparent proxy: it surfaces upstream errors to the client verbatim rather than silently retrying within a black box. Error responses still feed the routing signal, so a flaky provider is quickly deprioritized for subsequent traffic — but the individual failed request is returned as-is. Client SDKs (OpenAI, Anthropic, LangChain, etc.) already ship mature, configurable retry and backoff; configure it there and let sturnus steer those retries toward the healthiest provider.
  • Latency-based, not cost or quality-based. Routing optimizes time-to-first-chunk within an alias, and every model routed under that alias should be largely interchangeable. sturnus never trades quality or cost for speed — it just picks the fastest among options you've already deemed equivalent.

Contents

  • Configuration
  • Endpoints
  • Observability
  • Session affinity
  • How routing works
  • Docker
  • Building

Configuration

# use 127.0.0.1:4000 if running locally rather than in a container
listen = "0.0.0.0:4000"

# Providers: where to send requests
[provider.openai]
base_url = "https://api.openai.com/v1"
api_key = "${OPENAI_API_KEY}"

# Vertex AI via GKE Workload Identity (no API key needed)
[provider.vertex]
vertex_ai = { project_id = "my-gcp-project", location = "us-central1" }

# Model map: aliases the client uses → provider+model candidates
[model]
fast = [
  { provider = "openai", model = "gpt-4o-mini" },
  { provider = "vertex", model = "google/gemini-2.5-flash" },
]

[routing]
ewma_alpha = 0.3          # smoothing for the latency and success-rate EWMAs (higher = more reactive)
error_threshold = 0.5      # error-rate EWMA above which a session-affinity pin is broken (routing weights are unaffected)

See config.example.toml for all providers (Groq, Azure, Google AI Studio, Anthropic, local OpenAI-compatible) and options.

Environment variables in ${VAR} syntax are interpolated at config load time. Where they're available in an .env file (KEY=VALUE per line), pass it with --env-file:

sturnus --env-file /secrets/.env
Vertex billing attribution

For Vertex providers, sturnus can inject sidecar-controlled labels into outbound requests so the resulting spend shows up tagged in GCP Billing Export. The labels live in a top-level [attribution] block (typically deployment identity sourced from env vars) and are merged into each request body for any Vertex provider that opts in:

[attribution]
service = "${SERVICE_NAME}"
owner = "${OWNER}"
env = "${ENV}"

[provider.vertex]
vertex_ai = { project_id = "my-project", location = "us-central1", attribution = true }

Sidecar keys take precedence over any client-supplied labels keys with the same name; disjoint client keys are preserved. The feature is currently scoped to Vertex only. Keys and values must conform to Vertex naming rules ([a-z][a-z0-9_-]{0,62}).

Endpoints

Method Path Description
POST /v1/chat/completions Proxied to upstream (model alias resolved)
POST /v1/embeddings Proxied to upstream (model alias resolved)
GET /health Returns {"status":"ok"}
GET /status Returns current streaming/non-streaming EWMAs, error rate, and status per candidate
GET /metrics Prometheus metrics (see below)

Observability

Metrics

Prometheus metrics on /metrics, all labelled by alias, provider, model:

Metric Type Meaning
sturnus_requests_total counter Completed responses, additionally labelled by status_code (includes upstream 4xx/5xx)
sturnus_ttfc_seconds histogram Streaming time-to-first-chunk (streaming requests only)
sturnus_latency_seconds histogram Non-streaming full response time (non-streaming requests only)
sturnus_errors_total counter Transport failures that never produced a response (timeout, connect, DNS)
sturnus_buffer_rejections_total counter Requests shed with 429 because the aggregate buffer budget was full (no per-alias labels)

Connection failures are zero-initialised at startup so a missing series is never mistaken for "no errors".

Logging

Structured logging via tracing: coloured text on a terminal (respecting NO_COLOR), newline-delimited JSON when piped or redirected. Set the format with --log-format <auto|pretty|json> (or STURNUS_LOG_FORMAT) and the level with RUST_LOG (default sturnus=info).

Each request gets a span with a request_id; a client-supplied W3C traceparent propagates as trace_id and parent_span_id for cross-service correlation.

Session affinity

Every response includes an x-session-affinity header (e.g. openai/gpt-4o-mini). Pass it back on subsequent requests to pin to the same provider — useful for multi-turn conversations where context is provider-specific:

response = client.chat.completions.create(
    model="fast",
    messages=[{"role": "user", "content": "Hello"}],
)
affinity = response.headers["x-session-affinity"]  # e.g. "openai/gpt-4o-mini"

response = client.chat.completions.create(
    model="fast",
    messages=[{"role": "user", "content": "Follow-up"}],
    extra_headers={"x-session-affinity": affinity},
)

Fully stateless — works across pods with no shared state. The pin is honored until the pinned candidate's error-rate EWMA breaches error_threshold (at the default smoothing, roughly two consecutive errors), at which point the header is ignored and a new provider is selected — check the updated x-session-affinity in the response. Unknown or malformed headers fall back to normal routing.

How routing works

  1. Client sends POST /v1/chat/completions with "model": "fast".
  2. Sidecar looks up the fast alias and computes each candidate's effective latency: its latency EWMA divided by its success-rate EWMA. A candidate erroring with probability p needs ~1/(1-p) attempts per success, so errors inflate effective latency the same way slowness does.
  3. Each candidate is weighted by (best_effective / its_effective)^k, so the best gets the bulk of traffic and worse ones a shrinking-but-nonzero share. A deterministic low-discrepancy sequence (golden-ratio Weyl sequence) turns those weights into picks.
  4. Because worse candidates always keep a small share, their EWMAs stay fresh — a provider that recovers (faster responses or errors stopping) wins traffic back automatically; a cold candidate (no latency data yet) probes at a quarter of the best candidate's rate, scaled by its success rate, until its first samples land.
  5. The model field is rewritten to the real model name, auth headers are set, and the request is forwarded.
  6. TTFC is measured at first chunk arrival and fed back into the EWMA; the response status (any non-2xx counts as an error, including upstream 4xx) feeds the success-rate EWMA.

The best provider is exploited heavily while worse ones keep enough traffic to stay measured. A candidate's probe share shrinks with how bad it looks but is floored at 1%, so re-detecting a recovered provider costs at most ~100 requests — and during an outage at most ~1% of an alias's traffic is spent on the failing candidate.

Docker

When running in Docker or as a Kubernetes sidecar, listen must be 0.0.0.0:4000 (the value in config.example.toml) — 127.0.0.1 only accepts connections from within the container itself.

On Kubernetes, run sturnus as a native sidecar — an init container with restartPolicy: Always (stable since v1.29). It then starts before the app container and is terminated after it, so the proxy is ready for the app's first request and stays up while the app drains.

Memory needs no tuning: the aggregate request-buffer budget defaults to half the container's memory limit (read from cgroups at startup, logged with its source), so a small sidecar sheds excess load with 429s rather than getting OOM-killed. Override with routing.max_buffered_bytes if you want a different ceiling.

The image is published as a multi-arch (amd64/arm64) scratch container to ghcr.io/sturnus-dev/sturnus. Tags follow semver: :latest, :5.0, :5.0.0.

To inject secrets via a mounted .env file:

docker run -v ./config.toml:/config.toml \
  -v ./secrets.env:/secrets/.env:ro \
  -p 4000:4000 \
  ghcr.io/sturnus-dev/sturnus:latest --env-file /secrets/.env
Vertex credentials outside GKE

On GKE, workload identity is picked up automatically. Elsewhere, supply credentials one of two ways.

A service account key, pointed to by GOOGLE_APPLICATION_CREDENTIALS (recommended for production):

docker run -v ./config.toml:/config.toml \
  -v ./sa-key.json:/sa-key.json:ro \
  -e GOOGLE_APPLICATION_CREDENTIALS=/sa-key.json \
  -p 4000:4000 \
  ghcr.io/sturnus-dev/sturnus:latest

Or gcloud ADC for local dev, mounted to $HOME/.config/gcloud/ (the image sets HOME=/root):

docker run -v ./config.toml:/config.toml \
  -v ~/.config/gcloud/application_default_credentials.json:/root/.config/gcloud/application_default_credentials.json:ro \
  -p 4000:4000 \
  ghcr.io/sturnus-dev/sturnus:latest

Building

# Development
cargo build

# Release (static binary with LTO)
cargo build --release

# Run tests
cargo test

License

MIT