惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

aimingoo的专栏
aimingoo的专栏
量子位
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
S
Schneier on Security
Cisco Talos Blog
Cisco Talos Blog
T
ThreatConnect
J
Java Code Geeks
博客园 - 司徒正美
A
Arctic Wolf
T
True Tiger Recordings
C
Cybersecurity and Infrastructure Security Agency CISA
Cyberwarzone
Cyberwarzone
Know Your Adversary
Know Your Adversary
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
Recorded Future
Recorded Future
P
Palo Alto Networks Blog
The Hacker News
The Hacker News
The Register - Security
The Register - Security
S
Securelist
www.infosecurity-magazine.com
www.infosecurity-magazine.com
C
CXSECURITY Database RSS Feed - CXSecurity.com
Application and Cybersecurity Blog
Application and Cybersecurity Blog
I
Intezer
P
Privacy & Cybersecurity Law Blog
Scott Helme
Scott Helme
K
Kaspersky official blog
博客园 - 聂微东
Last Week in AI
Last Week in AI
V
V2EX
小众软件
小众软件
F
Fox-IT International blog
Martin Fowler
Martin Fowler
Apple Machine Learning Research
Apple Machine Learning Research
T
Tenable Blog
F
Future of Privacy Forum
Microsoft Security Blog
Microsoft Security Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
腾讯CDC
Stack Overflow Blog
Stack Overflow Blog
C
Check Point Blog
阮一峰的网络日志
阮一峰的网络日志
GbyAI
GbyAI
T
Threatpost
I
InfoQ
P
Proofpoint News Feed
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
T
Tor Project blog
G
GRAHAM CLULEY
D
DataBreaches.Net

DEV Community

Designing Configuration for Scalable Treasure Hunts SSH Login Delays: The 10-Second Wait That Drives Us Crazy Building Production Multi-Agent Workflows in n8n: What 50 Deployments Taught Us A 3-layer memory system that gives Claude Code persistent context across sessions. Trishul SNMP Suite 2.0.1: Better MIBs, Traps, and SNMP Labs How I built a production AI SaaS as a solo developer India’s Laws Were Not Built for AI — And Courts Are Filling the Gap skill-insp: A Skill That Scores Other Skills Clprolf Minimalist Messaging in the Age of AI What's actually in a good .cursorrules file? I built 10 of them — here's what I learned Building Strong Python Basics – Loops, Functions and Logic How to Choose the Right Tech Stack for Your Project I built a free multi-tab JSON editor — here's what I learned HTTP Headers Every Developer Should Know (2026) Building Cross-Platform Digital Products: Challenges and Best Practices Data Privacy in the Age of AI: How Product Teams Can Build Trust with Users What Would WordPress Look Like If It Were Designed Today? Why Backup Success Does Not Mean Database Recoverability Local AI Office Assistant That Never Sends Your Documents to the Cloud Building TaskForge: Translating Enterprise Chaos into an Open-Source Scheduler Tesla P40 in a Homelab: 24GB of Inference on a Budget Llama 4: Meta's Latest — Scout, Maverick, and the MoE Revolution George Hotz called AI code 'slop.' He's half right. Como Construir um Fluxo de Trabalho Baseado em Engenharia de Prompt e Automação We Audited Our Agent Tool-Call Traces. Half Our Eval Data Was Garbage. The Hidden Cost of Downtime: How SRE Error Budgets Protect National Economic Infrastructure Getting started with openHUMANS can be an exciting venture for developers looking to create innovative applications in the realm of human-ce Stack Overflow: A Powerful Community for Developers and Learners From Language Models to Humanoid Minds ✨ Road to Senior #2: How Computers Think in Numbers Why LLM debugging fails on fragmented repository context How to Deploy a LangGraph Agent on AWS Bedrock AgentCore An outreach kit for solo founders whose drafts can't hallucinate Open Satchel is live Amy Kwalwasser and the Growing Importance of Quantum Risk Modeling I Built ShellReq - A Native API Client for VS Code & Terminal If Microsoft and Uber can't afford AI coding, what chance do the rest of us have? MADCAP: Building a Multi-Agent Debate CLI That Argues With Itself So You Don't Have To Why most AI fails at IDOR (and how AMAS fixes it with causal reasoning) How to Audit a Laravel Codebase You've Inherited LangGraph 워크플로우 템플릿 (v34) BugBench: a developer origin story and practical guide for VS Code / Kiro users A solution to messy token systems for Next.js A NestJS reference app that proves the nest-native stack under realistic backend pressure Observability for AI Systems: Monitoring Drift, Hallucinations, and Reliability in Production I Thought “Data Analyst” Was the Whole Game… Then I Entered the Data Avengers Office 👀 Create and configure network security groups How to analyze the cost of Kafka? How I Shipped 2,500+ Commits With AI Agents Using a 12-Phase Workflow [Boost] We built MDCMS, a Markdown-first CMS for teams using AI agents Zero Heap Allocations at 1.18 GB/s: Deep Dive into ForgeZero 4.0.x The Minimum Viable Test Suite for Working with Agents Why Perplexity Started Citing My Blog: 5 Changes That Actually Worked Sync Supabase via OAuth: No Connection String Needed I asked three AI models the same API question. Only one had it right. Implementing Saga Pattern With Lambda Durable Function Why does AI forget what you said (and how to fix it) I built a daily Wordle-style game for AI tools - Here's how Mapping Polish company structures: querying KRS direct via API Built tmpdrop — a tiny self-hosted ephemeral file drop Running Local LLM - 0$ Personal Agentic AI Assistant - Part 3 LLD Object-Oriented Design: Interfaces & Abstract Classes (Designing Contracts) The Smaller Ship: Vitalik, the Ethereum Foundation's Restructuring, and What It Leaves for Investors Looking for 4 people to build something weird with me Building a Local-Only RAG System with Ollama and TypeScript The False Positive Tax: a 1:1 TP:FP analysis of eslint-plugin-security What's new in Data Preprocessor 1.5.x — R codegen, Robust Scaler, and a deadlock post-mortem How I self-hosted my Flask app on an old laptop for almost free I built a free DSA interview prep site because I was tired of the existing options I built an AI agent that migrates Next.js Pages Router to App Router Prisma Query Logging and PostgreSQL: Where the ORM Ends and the Database Begins Prisma query logging y PostgreSQL: dónde termina el ORM y empieza la base From Browser to Server : The Journey of an HTTP Request (Demystifying the Web’s Infrastructure) Santa Augmentcode Intent Ep.6 I Benchmarked 17 ESLint Security Plugins. Only One Found Every Vulnerability. How to Build a High-Performance Image Optimization Pipeline in 5 Minutes 50 Linux Commands Every DevOps Engineer Must Know Less Toil, More Flow - Automating the Path from Request to Implementation The Code Review Checklist I Actually Use How I run a small blog on Astro 5 + Content Collections Git: Best Practices for Professionals How IBM Bob Became My Everyday Coding Companion Solana Passkey Wallet: Replacing Seed Phrases with SIMD-0075 I built a small browser puzzle game about arrows I wrapped Claude Code in a zsh function. Here's every decision I almost got wrong. Mobile Game Optimization: A Unity Developer's Checklist Git: Best Practices for Beginners Three days I lost chasing a ghost that was already dead on disk Why Too Many Parts Hurt ClickHouse Performance Guardrails for Agent Output: Pluggable Validation Before and After LLM Calls Gemma Forge: Local AI Without the Setup Wall From Half‑dead Prototype to Local‑Only AI Medical Assistant: Rewiring MedClinic with GitHub Copilot Runninig a forkbomb in Jenkins What’s Actually Happening When You Use Git Preventing Recursive Tool Loops in LangChain Agents Building a Rock-Paper-Scissors CLI with TypeScript — Union Types, Conditionals, and Jest Your AI Coding Agent Wastes 80% of Its Context. Fixed That with Graph Theory. Why Flutter Has Become the Go-To Framework for Fintech App Development
Auto-labelling 1.2M robotics frames with VLMs: a failover story
Marco Rinald · 2026-05-26 · via DEV Community

Marco Rinaldi

TL;DR: We needed to caption 1.2M reconstructed event-camera frames using vision-language models for auxiliary supervision. The first run died at 340K from Anthropic rate limits. Putting Bifrost in front of three VLM providers cut the rerun cost by 22% and finished in 9 hours.

So, the thing is, when you work at a neuromorphic vision startup, your training data looks strange. At Prophesee we accumulate event streams into time-binned windows that we render into pseudo-frames. For a self-supervised pretraining run on a new asynchronous backbone, we wanted natural-language captions on every window. Not because we're going language-first. The captions act as auxiliary targets for a contrastive head that sits alongside the actual event tensor.

1.2M frames. Three candidate VLMs: GPT-4o, Claude 3.7 Sonnet, Gemini 2.5 Pro. All three caption our weird greyscale reconstructions differently enough that we wanted a mix per frame.

I tried Anthropic first because the captions were qualitatively the best on our pilot set. Job died at 340,317 captions on a sustained TPM cap. That was a Friday evening before a long weekend in Bologna. I lost the weekend.

Choosing a gateway over more retry code

My first instinct was to write a smarter retry loop. Every CV engineer has this instinct when they discover REST APIs aren't deterministic. After about three hours of writing what was clearly going to become a half-baked rate-limit handler with provider-specific quirks, I stopped.

The actual problem was that I had multiple providers, all with their own SDKs and their own error formats. I needed something in the middle that knew about quotas, retries, and fallback chains, and that wasn't going to require me to learn yet another vendor lock-in.

I looked at LiteLLM, Portkey, and Bifrost. Ended up running Bifrost in Docker on the same node as the batch dispatcher.

The setup

Bifrost runs as a single Go binary or container. The config that mattered for us was the fallback chain. Here's the trimmed version we shipped:

providers:
  openai:
    keys: [${OPENAI_KEY_1}, ${OPENAI_KEY_2}]
    weight: 0.5
  anthropic:
    keys: [${ANTHROPIC_KEY_1}]
    weight: 0.3
  vertex:
    keys: [${VERTEX_KEY_1}]
    weight: 0.2

fallbacks:
  - model: openai/gpt-4o
    next: [anthropic/claude-3-7-sonnet, vertex/gemini-2.5-pro]
  - model: anthropic/claude-3-7-sonnet
    next: [openai/gpt-4o, vertex/gemini-2.5-pro]

Enter fullscreen mode Exit fullscreen mode

Our batch dispatcher called http://bifrost:8080/v1/chat/completions with whatever model we picked for that frame. If a provider was over quota, Bifrost handled the failover and the dispatcher never saw the error. That part is documented under retries and fallbacks.

We also turned on semantic caching for the prompt template because we caption a lot of near-identical static scenes. Robotics demos have long boring stretches. Cache hit rate landed around 14% on the full run, which isn't huge but covered the cost of running the gateway itself.

How it compared

Concern LiteLLM Portkey Bifrost
Multi-provider failover Yes Yes Yes
Self-hosted in our VPC Yes Paid tier Yes (Docker)
Semantic caching built-in Plugin Yes Yes
Prometheus metrics native Partial Yes Yes
Single binary deploy No (Python) N/A (SaaS) Yes (Go)
800 req/s sustained GIL issues N/A Held

LiteLLM was the most familiar option for our team because we already use it for eval scripts. Honestly for offline single-process work it's fine. The problem hit us when we tried to push sustained throughput through one Python process. Bifrost being Go meant we didn't fight the GIL. Portkey's hosted product is genuinely nice and the analytics UI is better than what Bifrost shipped, but we needed everything inside our VPC for frames covered by client confidentiality.

Results

The full 1.2M caption run finished in 9 hours and 14 minutes. Total cost was $4,180, down from a projected $5,360 if we'd run everything on GPT-4o. The 22% saving came from routing roughly a third of traffic to Gemini, which is cheaper per token for our prompt length.

Two providers had transient 429 spikes during the run. I didn't have to do anything about either. The gateway absorbed them. I noticed only because the per-provider request graph in the Bifrost dashboard had a visible dip on Anthropic around hour four.

Trade-offs and limitations

Not everything was clean.

Latency overhead. Bifrost adds a hop. For batch labelling it didn't matter. For an interactive vision app streaming a webcam, I'd benchmark carefully before putting any gateway in the path.

Caption drift across providers. Captions from Gemini and Claude are stylistically different even with the same prompt. We had to normalise downstream with a small T5 rewriter. The gateway doesn't solve this for you.

Config sprawl. Once you have weights, fallbacks, virtual keys, and cache rules in one YAML, it gets hard to reason about which path a given request actually took. Bifrost's logging helped but I had to dig.

MCP and tool use. We didn't need them. If you're building an agent product instead of a labelling pipeline, the MCP support might matter more than failover.

What I'd do differently

Run a 5K-frame pilot before launching the full job. We did 50K, which was enough to catch the rate-limit issue conceptually but not enough to see what 800 req/s sustained does to a Python process. Also: drink the espresso before debugging gateway configs at 1am, not after.

Further reading