Measuring AI Gateway Failover: 30 Days of Production Data

TL;DR: We measured failover latency across three AI gateways (Bifrost, LiteLLM, Portkey) during 30 days of production traffic at Nexus Labs. Bifrost added 11ms p99 overhead with automatic provider fallback. The model is the easy part. Routing it reliably is not.

Our agent platform at Nexus Labs handles around 2.4M LLM requests per day. Half of those hit OpenAI, the rest spread across Anthropic, Bedrock, and Vertex. When OpenAI had its 4-hour incident on April 23, we lost 38 minutes of traffic before our homegrown retry logic gave up and rerouted.

That hurt. So we replaced the retry layer.

The actual problem

Most gateway benchmarks measure throughput on a cold path with no failures. That tells you very little about production. What I care about: how long does it take for a request to recover when a provider returns 429 or 503? How much p99 latency does the gateway add when nothing is wrong?

Our team of 9 engineers spent two weeks instrumenting three options. Same hardware (c6i.4xlarge, 2 nodes behind an NLB). Same upstream credentials. Same request distribution sampled from our actual logs.

Setup

Each gateway sat between our agent service and four providers. We configured identical fallback chains: OpenAI primary, Anthropic secondary, Bedrock tertiary. Cache disabled. Rate limits set to mirror our prod allocation.

Here's the Bifrost config we used:

providers:
  openai:
    keys:
      - value: env.OPENAI_API_KEY
        weight: 1.0
  anthropic:
    keys:
      - value: env.ANTHROPIC_API_KEY
        weight: 1.0
  bedrock:
    keys:
      - value: env.AWS_BEDROCK_KEY
        weight: 1.0

fallbacks:
  - provider: openai
    model: gpt-4o
    fallback_to:
      - provider: anthropic
        model: claude-sonnet-4
      - provider: bedrock
        model: anthropic.claude-sonnet-4

Documented behavior is at https://docs.getbifrost.ai/features/retries-and-fallbacks. LiteLLM and Portkey have equivalent configs. Different YAML shape, same semantics.

Results

We ran 720 hours of mirrored traffic. Numbers below are from the actual logs, not synthetic load.

Gateway	p50 overhead	p99 overhead	Failover time (provider down)	Memory at 1k RPS
Bifrost	3ms	11ms	180ms (one retry + switch)	412 MB
LiteLLM	8ms	41ms	620ms	890 MB
Portkey (self-hosted)	6ms	29ms	340ms	650 MB

Bifrost is written in Go. LiteLLM is Python with FastAPI. That accounts for most of the gap on the hot path. Not all of it. Bifrost's fallback chain evaluates synchronously without re-queuing the request, which matters when you're already on retry attempt two.

Portkey was solid but the self-hosted version lagged their managed offering in feature parity. LiteLLM's killer feature for our team was richer support for custom cost-tracking callbacks. We still use those for finance reporting.

What we used Bifrost for

Three things, specifically.

Fallback routing. When OpenAI returns 429, the request goes to Anthropic with the equivalent model. Our agent code never knows. Docs at https://docs.getbifrost.ai/features/retries-and-fallbacks.

Semantic caching. For our evaluation harness specifically. We replay 18,000 prompts against new model versions nightly. Cache hit rate is 73% because the evaluation suite asks the same questions repeatedly. That's around 13k requests we don't pay for each night. Reference: https://docs.getbifrost.ai/features/semantic-caching.

Prometheus metrics. Native export. We already had a Prom stack. Five-minute integration. The default dashboards aren't great but the metrics themselves are useful. Reference: https://docs.getbifrost.ai/features/observability/default.

What we did not use

MCP gateway, governance, SSO. Our auth sits in front of the gateway, not inside it. The custom plugins interface looked interesting but we haven't needed one yet.

Trade-offs and Limitations

Bifrost is younger than LiteLLM. The provider list is wide (23+) but if you need a niche provider, check the docs first. The plugin interface is straightforward so you can add one yourself, but that's still work.

The web UI is decent for initial setup, not where you want to be doing complex governance. Configure things in YAML and version them in git like anything else.

If you're already deep in LiteLLM and using its callback ecosystem, migration cost is real. LiteLLM has more community integrations because it's been around longer. Portkey is also a fine choice if you want a managed control plane and don't want to operate a gateway yourself. Pick based on what your team will actually maintain.

Last caveat. The numbers above are from our workload. Your traffic shape will differ. Run the test yourself before deciding.

推荐订阅源

DEV Community