惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Help Net Security
Help Net Security
G
Google Developers Blog
雷峰网
雷峰网
WordPress大学
WordPress大学
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Engineering at Meta
Engineering at Meta
Security Latest
Security Latest
T
Threat Research - Cisco Blogs
AWS News Blog
AWS News Blog
F
Full Disclosure
C
Cybersecurity and Infrastructure Security Agency CISA
T
The Exploit Database - CXSecurity.com
J
Java Code Geeks
U
Unit 42
C
Cyber Attacks, Cyber Crime and Cyber Security
V
V2EX
C
Cisco Blogs
博客园 - 司徒正美
Project Zero
Project Zero
L
LINUX DO - 热门话题
阮一峰的网络日志
阮一峰的网络日志
Blog — PlanetScale
Blog — PlanetScale
Scott Helme
Scott Helme
A
About on SuperTechFans
Hugging Face - Blog
Hugging Face - Blog
S
Securelist
小众软件
小众软件
aimingoo的专栏
aimingoo的专栏
S
Schneier on Security
G
GRAHAM CLULEY
酷 壳 – CoolShell
酷 壳 – CoolShell
Cyberwarzone
Cyberwarzone
MongoDB | Blog
MongoDB | Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - 叶小钗
T
Threatpost
Recorded Future
Recorded Future
C
CXSECURITY Database RSS Feed - CXSecurity.com
宝玉的分享
宝玉的分享
N
News and Events Feed by Topic
人人都是产品经理
人人都是产品经理
The Register - Security
The Register - Security
S
Security Archives - TechRepublic
博客园 - Franky
N
News | PayPal Newsroom
Simon Willison's Weblog
Simon Willison's Weblog
S
SegmentFault 最新的问题
W
WeLiveSecurity
A
Arctic Wolf
B
Blog

Crazyrouter Blog (English)

Ideogram AI Guide 2026: Product Mockups, Text Rendering, and API Automation Akool AI Voice Generator Review 2026: API Alternatives for Developers GLM 4.6 API Guide 2026: Build Chinese-English Agents with Tool Calling Google Veo3 API Guide 2026: Batch Video Generation, QA, and Fallbacks Claude Opus 4.8 vs Opus 4.7: Real API Benchmark Results for Developers Opus 4.8 vs Opus 4.7 Coding Test: What Changed for Developers? Opus 4.8 vs Opus 4.7 for Agents: JSON, Tool Use, and Structured Output Gemini 2.5 Flash-Lite for RAG, Agent Routing, and Cost per Successful Task Gemini 2.5 Flash-Lite for Support Automation and Ticket Triage Gemini 2.5 Flash-Lite Use Cases: The Practical Automation Tier for Developers Claude Jupiter v1-p vs GPT-5.5 Benchmark: Real API Test on Reasoning and Coding Claude Jupiter v1-p vs Claude Opus 4.7 vs Sonnet 4.6: Live API Test Claude Jupiter v1-p vs Claude Opus 4.7 vs Sonnet 4.6: Live API Test Claude Code Pricing 2026: Pro vs Max vs Team vs API Costs Claude Opus 4.7 vs DeepSeek V4 Pro: Real API Compatibility and Coding Benchmark Gemini CLI Complete Guide 2026: Repo Automation, CI Agents, and Multi-Model Routing Ideogram AI Guide 2026: Brand Design Automation, API Workflows, and Alternatives GLM 4.6 API Guide 2026: Agents, RAG, Tool Calling, and Bilingual Apps WAN 2.2 Animate Tutorial 2026: Character Consistency, Shot Control, and API Workflows Google Veo3 API Guide 2026: Production Video Pipelines, Prompts, Pricing, and Fallbacks AI API Pricing Comparison 2026: Text, Image, Video, Caching, and Router Costs Codex CLI Installation Guide 2026: Windows, macOS, Linux, Proxies, and CI Setup How to Get a Claude API Key in 2026: Secure Setup for Teams, CI, and Alternatives Gemini Advanced Review 2026: Is It Worth It for Coding, Research, and API Teams? Claude Code Pricing Guide 2026: Team Agent Budgets, API Fallbacks, and Cost Control Seedance 2.0 Pricing: Convert 46 CNY per Million Tokens to Cost per Second Qwen2.5-Omni Guide 2026: Real-Time Voice, Vision, and Multimodal Agents Kimi K2 Thinking Guide 2026: Reasoning Workflows, Evals, and Cost Control Google Veo3 API Guide 2026: Batch Video Pipelines, Pricing, and Fallbacks Codex CLI Installation Guide 2026: macOS, Linux, WSL, Proxies, and Dev Containers How to Get a Claude API Key in 2026: Safe Production Setup and Alternatives AI API Pricing Comparison 2026: GPT, Claude, Gemini, Video, and Agent Workloads Gemini Advanced Review 2026: Is It Worth It for Developer Teams? Claude Code Pricing Guide 2026: API Fallbacks, Team Seats, and Budget Control Seedream 4.0 API Tutorial 2026: Batch Image Generation, Product Creative, and Pricing Qwen2.5-Omni Guide 2026: Real-Time Voice, Vision, Text Agents, and API Integration Kimi K2 Thinking Guide 2026: Reasoning Agents, Evaluation Workflows, and API Cost Control WAN 2.2 Animate Tutorial 2026: Character Motion, Shot Control, API Pipelines, and Pricing Google Veo3 API Guide 2026: Production Video Workflows, Prompts, Pricing, and Fallbacks AI API Pricing Comparison 2026: OpenAI, Claude, Gemini, DeepSeek, and Router Costs How to Get a Claude API Key in 2026: Setup, Security, Rotation, and Alternatives Codex CLI Installation Guide 2026: macOS, Linux, WSL, Proxies, and Devcontainers Gemini Advanced Review 2026: Is It Worth It for Developers and API Builders? Claude Code Pricing Guide 2026: CI Agents, Team Seats, and API Budget Planning AI API Gateway for Singapore and Malaysia Developers: One Endpoint for GPT, Claude and Gemini AI API Gateway for Thai Developers: Use GPT, Claude and Gemini with One Key One API Key for GPT, Claude and Gemini: A Practical Setup for Central Asia Developers Gemini 3.5 Flash vs Claude Response-Tier Models: Which One Should Developers Use? Gemini 3.5 Flash vs Gemini 3 Flash vs Gemini 2.5 Flash: Real API Benchmark "How to Test Multiple AI Image Models with One API Key" Codex CLI Installation Guide: Setup on macOS, Linux, Windows WSL and CI/CD Seedream 4.0 API Tutorial: ByteDance Image Generation for Production Pipelines Kimi K2 Thinking Model: Complete Developer Guide for Reasoning Workflows Luma Ray 2 Review: AI Video Generation Quality, Speed, and API Guide Pika 2.2 New Features Review: Scene Director, Sound Design, and API Updates Google Veo 3 API Guide: Video Generation with Audio for Developers AI Lip Sync Tools Comparison 2026: Best APIs for Talking Avatars and Video Dubbing Gemini Advanced Review May 2026: Is It Worth $20/Month for AI Power Users? Claude Code Pricing in May 2026: Max Plan, Opus 4, and Real Cost Breakdown Hermes Agent + Crazyrouter: One-Click Setup for 627+ AI Models Text-Embedding-3-Small: Complete Guide to OpenAI's Most Popular Embedding Model (2026) AI Meme Generator & Coloring Book Creator with GPT-image-2 — Fun Projects That Actually Make Money AI Future Baby Prediction with GPT-image-2 — See What Your Child Might Look Like Ghibli Style Photo Transformation with GPT-image-2 — Turn Any Photo Into Anime Art AI Action Figure Generator with GPT-image-2 — Turn Anyone Into a Boxed Toy AI Face Reading & Personal Color Analysis with GPT-image-2 — Two Viral Use Cases in One Guide AI Palm Reading with GPT-image-2 — Generate Professional Palmistry Analysis from a Single Photo Gemini 2.5 Flash-Lite Pricing Explained — The Cheapest Gemini Model for High-Volume Workloads Claude Sonnet 4.6 Pricing Explained — Caching, Tiers, and How to Save 45% with Crazyrouter Gemini Free vs Gemini Advanced: Pricing, Limits, Features, and Is It Worth Paying For? AI Context Window Comparison (2026): GPT, Claude, Gemini Token Limits by Model Claude Sonnet 4.5 Pricing Explained — Caching, Batch API, and How to Save 45% with Crazyrouter Claude Opus 4.7 Pricing Explained — New Tokenizer, Caching, and How to Save 45% with Crazyrouter Claude Opus 4.6 Pricing Explained — Caching, Tiers, and How to Save 45% with Crazyrouter Best AI Models for RAG Applications 2026: Embeddings, Retrieval, and Generation Seedance 2.0 vs Kling 2.1 vs Runway Gen 4 Turbo: Video AI API Comparison 2026 AI Video Generation API Pricing May 2026: Veo3 vs Kling vs Runway vs Sora How to Get Claude API Key in China 2026: Complete Setup Guide AI Coding Tools ROI Calculator: Claude Code vs Codex CLI vs Gemini CLI Cost Analysis 2026 AI API Pricing Comparison May 2026 - Complete Developer Guide Grok 4 API Pricing Complete Guide 2026 DeepSeek R2: The 32B Reasoning Model That Runs on a Single GPU — Complete Guide for Developers "GPT-5.1 Codex Max Pricing Explained — The Code-Specialized Model and How to Save with Crazyrouter" GPT-4o Pricing Explained — The Legacy Flagship That's Still Worth Using GLM-5 Pricing Explained — Zhipu AI's Flagship Model and How to Access via Crazyrouter Gemini 3 Flash Pricing Explained — Balanced Speed and Cost with Crazyrouter Savings "Gemini 3.1 Pro Pricing Explained — Context Tiers, Caching, and How to Save with Crazyrouter" GPT-5.5 Pricing Explained — OpenAI's Latest Flagship, Reasoning Tokens, and How to Save with Crazyrouter AI Model Pricing Guide 2026: What Every Model Costs on Crazyrouter (and How Much You Save) MiniMax M2 Pricing Explained — China's Competitive AI Model and How to Access via Crazyrouter Grok 4.1 Thinking Pricing Explained — Reasoning Tokens, Caching, and How to Save with Crazyrouter Grok 4.1 Pricing Explained — 2M Context, Caching, Tool Costs, and How to Save with Crazyrouter GPT-5 Pricing Explained — Reasoning Tokens, Caching, Batch API, and How to Save with Crazyrouter GPT-5-nano Pricing Explained — The Cheapest GPT Model for High-Throughput Workloads GPT-5-mini Pricing Explained — Ultra-Low Cost AI with Caching and Batch Discounts GPT-5.4 Pricing Explained — Cached Input, Context Tiers, Batch API, and How to Save with Crazyrouter GPT-5.2 Pricing Explained — Caching, Batch API, and How to Save with Crazyrouter OpenRouter vs Crazyrouter (2026): Pricing, Models, and Which API Gateway Fits Developers Better Suno v4 vs v5 vs v4.5: Which Version Sounds Better and Is Worth Using in 2026? How to Use Claude Code with Crazyrouter: Base URL Setup, Model Routing, and Cost Savings
Claude Code Builds a Multi-Model Odds Alert Router: claude-fable-5 vs GPT-5.5 vs Qwen
Crazyrouter Team · 2026-06-13 · via Crazyrouter Blog (English)

Claude Code Builds a Multi-Model Odds Alert Router: claude-fable-5 vs GPT-5.5 vs Qwen#

The previous project in this series built a World Cup odds movement monitor with Claude Code and claude-fable-5.

That project answered one question:

Can Claude Code build a monitoring pipeline and use claude-fable-5 to summarize odds alerts as valid JSON?

The next question is more important for production:

What happens when the model fails?

So this third project turns the odds monitor into a multi-model alert router.

Instead of trusting one model, we send the same structured task through several routes on Crazyrouter:

  • claude-fable-5
  • gpt-5.5
  • qwen-plus
  • gemini-2.5-flash

Then we measure:

  • HTTP status;
  • latency;
  • token usage;
  • valid JSON;
  • required schema keys;
  • fallback order.

This is still an analytics engineering demo. It is not betting advice.


Why this topic matters#

Most AI examples stop at a successful single model call.

That is not enough for real systems.

If your application depends on structured output, the real question is not:

Which model sounds smartest?

The real question is:

Which model returns a usable object for this exact workflow?

For an odds alert dashboard, the output must be machine-readable. A beautiful paragraph is not enough. The application needs valid JSON with expected keys.

So the router treats these as failures:

  • HTTP error;
  • invalid JSON;
  • missing required keys;
  • timeout;
  • output wrapped in a format the parser cannot handle;
  • truncated JSON.

That is the difference between a demo and a production workflow.


Input: the same odds alerts as before#

The input comes from the previous odds movement monitor.

The Python script converted decimal odds into implied probability changes and flagged movements above a threshold.

Example alerts:

The router task is not to predict match results.

The task is to summarize the alerts as a safe engineering report.

Required JSON keys:


Crazyrouter setup#

The test used the same OpenAI-compatible API base URL:

The request shape was intentionally compact:

The prompt explicitly required:

The router then attempted to parse each response and check required keys.


The benchmark result#

Here is the real test result:

ModelHTTPLatencyTotal tokensValid JSONResult
claude-fable-54001.09sFalseInvalid request
gpt-5.52008.07s950TrueValid fallback
qwen-plus2005.68s601TrueBest primary
gemini-2.5-flash2004.70s1020FalseTruncated JSON

The router recommendation was:

This is exactly why model routing matters.

The fastest HTTP response was not the best production route. Gemini responded quickly, but produced invalid JSON. claude-fable-5 had worked in the previous article with a slightly different payload, but returned HTTP 400 here.

For this exact task, qwen-plus won because it returned valid JSON faster than gpt-5.5.


What qwen-plus returned#

The qwen-plus response passed all required keys:

That is not a betting recommendation. It is a data-quality and monitoring summary.


What GPT-5.5 returned#

gpt-5.5 was slower but also valid.

Its output included stronger caveats:

This makes gpt-5.5 a good fallback candidate.

If the primary route fails, it can provide a more conservative explanation.


Why claude-fable-5 failed here#

This is the most interesting part.

In the previous project, claude-fable-5 successfully returned valid JSON when the request was compact and tuned for that model.

In this router benchmark, the request used the same payload shape across all models.

claude-fable-5 returned:

That does not mean the model is bad.

It means payload compatibility is part of production model quality.

A model can be useful in one request shape and fail in another. If your application routes dynamically, the router must understand those differences.

This is a very practical lesson:


Why Gemini failed here#

gemini-2.5-flash returned HTTP 200, but failed JSON parsing.

The content started like valid JSON but was truncated:

That is a different failure mode from claude-fable-5.

One model failed at the request layer.

Another model failed at the output layer.

The router must treat both as failures.

This is why HTTP status alone is not enough.


Router rule#

The router rule for this demo is simple:

Pseudo-code:

This is boring code, but it is what makes AI workflows usable.


Cost per valid output beats cost per token#

A pricing page tells you cost per token.

A production workflow cares about cost per valid output.

Those are not the same.

A cheap model that returns invalid JSON may trigger retries and fallback calls. A more expensive model may be cheaper for the actual workflow if it succeeds more often.

For this benchmark, the router would choose:

That does not mean Qwen is always better. It means Qwen was better for this exact payload and schema.

That is the point.


What Claude Code built#

Claude Code’s role here is not to pick a favorite model.

It should build the router and the evidence trail:

This gives you:

  • raw responses;
  • latency records;
  • token usage;
  • parse results;
  • schema validation;
  • routing recommendation.

That is much more valuable than a single polished answer.


Why Crazyrouter is useful here#

Without an API gateway, this benchmark would require separate provider integrations.

With Crazyrouter, the test uses one interface:

That makes it practical to route by task, not by brand loyalty.

For example:

  • use qwen-plus for fast structured alert summaries;
  • fallback to gpt-5.5 when stricter explanation is needed;
  • tune claude-fable-5 with a compatible payload for tasks where it performs well;
  • reject any model output that fails validation.

This is how multi-model applications should be built.


Final takeaway#

The lesson from this project is simple:

In production AI, the best model is the one that returns an accepted output for the task.

Not the most hyped model.

Not the model with the fastest HTTP response.

Not the model you personally prefer.

For this odds alert router, the winner was qwen-plus, with gpt-5.5 as fallback. claude-fable-5 remains useful, but this payload needs tuning. gemini-2.5-flash was fast but invalid for the JSON workflow.

That is exactly why routers exist.

If you are building Claude Code projects that need structured output, model comparison, and fallback routing, try Crazyrouter:

https://crazyrouter.com?utm_source=crazyrouter_blog&utm_medium=article&utm_campaign=claude_code_odds_router