惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
博客园 - 三生石上(FineUI控件)
Martin Fowler
Martin Fowler
WordPress大学
WordPress大学
D
Docker
S
SegmentFault 最新的问题
博客园 - 聂微东
美团技术团队
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Last Week in AI
Last Week in AI
M
MIT News - Artificial intelligence
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
GbyAI
GbyAI
L
LangChain Blog
Vercel News
Vercel News
博客园 - 叶小钗
MongoDB | Blog
MongoDB | Blog
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
T
Threat Research - Cisco Blogs
T
Threatpost
Scott Helme
Scott Helme
T
Tailwind CSS Blog
Latest news
Latest news
Stack Overflow Blog
Stack Overflow Blog
Blog — PlanetScale
Blog — PlanetScale
The Register - Security
The Register - Security
罗磊的独立博客
P
Proofpoint News Feed
腾讯CDC
S
Schneier on Security
雷峰网
雷峰网
A
About on SuperTechFans
T
Tenable Blog
F
Full Disclosure
Cyberwarzone
Cyberwarzone
博客园_首页
有赞技术团队
有赞技术团队
K
Kaspersky official blog

DEV Community

China Payment Terms: T/T, LC, Escrow 9 Services, One Architecture: What We Learned Shipping FSx for ONTAP Logs to Every Major Observability Platform PCB Assembly in China: Buyer's Guide How to Source Electronics from China We Built a Real-Time AI Research Collaborator Into our JOT writing tool How to Give Claude Access to Snowflake Without Exposing PII The Agent that grows with you What Building Agent_Sudo Taught Me About AI Agent Security (Before I Found Any Users) Abortion Rights Matter PySide6 vs Electron: Why I shipped a 118 MB Windows desktop tool, not a 250 MB cross-platform one MCP Servers for BI Tools: Looker, Tableau, Power BI, Mode (2026) My AI Agent Kept Lying to Me. Then It Tried to Trick Me. Atlan Alternatives: 6 Open-Source Data Catalogs Compared (2026) How I stopped wrestling with regex and started using AI for data extraction How I Built an AI Assistant That Grows Its Own Tools Interactive Floor Plans for Real Estate Developers — Why Static PDFs Are Dead Vue slot to React: How does VuReact handle it? I Found 54 Reliability Issues in My 14-Agent AI System — Here's What Broke I Built 24 Free Browser Tools in 6 Weeks — Here's What I'd Do Differently Octorato: an open-source AI agent OS with built-in per-client FinOps RAG Explained for Beginners: How AI Assistants Stop Making Things Up Curing LLM Hallucinations: Building a Production-Grade Medical RAG with PubMed and Hybrid Search I don't want to write HTML or fight global CSS, so I built a TypeScript DSL FSx for ONTAP Audit Logs with Data Residency in your region with Sumo Logic Someone contributed 3,324 lines to our open K-12 AI lesson library — a 6-unit series asking students to interrogate AI, not just use it My website has two audiences now. I only built for one of them. AI-Powered Root Cause: Correlating File Access with APM via Dynatrace Opus 4.8 ships Dynamic Workflows — hundreds of parallel subagents per session. Read this before you wire it into prod. We Cut $120,000 from Our Cloud Bill Without Sacrificing Reliability Stress Concentration Factor: Why a Small Hole Can Triple Local Stress Streaming an LLM response, in 4 GIFs High-Cardinality File Access Analysis with Honeycomb + OTel Introduction to n8n: Beginner Course Summary What Happens in 2 Milliseconds: Anatomy of a Single HTTP Request Through a Production WAF Why Veltrix Thought It Could Buy Its Way Out of a Distributed Lock Problem 10 Free Developer Utility Tools That Run Entirely in Your Browser 《认知革命播客》:个人AI基础设施的深度实践与安全思辨 Weekend Supervised Vibe Coding Why I Run Claude Code Plugins for Brand Voice Enforcement x.klickd v4.1: Portable, Encrypted, Human-Governed Memory for AI Workflows That Don’t Reset EC2 to Serverless: Modernizing FSx for ONTAP Splunk Integration AI Can Introduce Complexity Without Introducing Noise — But Only If the Repo Knows How to Hold the Complexity 🛠️Building My First AI Agent with Hermes Agent 🤖 I Built a Flutter App with Firebase + MercadoPago and Turned It Into a Starter Kit (Real Production Code) Hermes Commander: An Autonomous Research Assistant Powered by Hermes Agent 🧠 Why Webhooks Fail Behind Firewalls (And Why Every Fix Has the Same Problem) Have Antigravity review prompts update themselves when your codebase changes 5 Browser-Based Image Tools That Work Entirely Offline — No Upload Required 7 Free PDF Tools That Never Upload Your Files — All Client-Side Building a Cloud SIEM from Scratch with AWS Lambda and EventBridge Compound Engineering: A Plugin That Makes Your AI Coding Agent Smarter Over Time "I Reviewed 50 Dev Resumes — These 5 Mistakes Killed Their Chances" How to Test Your SPF Record for Common Mistakes (Step by Step) Building a Real-Time Financial Sentiment API: Handling Noise and LLM Hallucinations Tokyo Transit: How MCP Helped Me Fix a Broken Multi-Agent System Try the Tech Radar #2 — Markdown Typst Converter (Typst's Syntax Is Closer to Markdown Than LaTeX) 🩺 Inside Med AI: How We Engineered a 100M Token Hyper-Scale Clinical Intelligence Suite 🚀 Common Mistakes New Developers Always Make & How to Avoid Them Effectively Session Management, Rate Limiting & Caching using Redis Why Accuracy Is Not Enough: Evaluation Metrics Every AI Engineer Should Understand How I Built One Building Instagram Data Workflows with HikerAPI (Without Maintaining Scrapers) Claude Code can't open my browser. Cowork can't run my tests. So I wired them together. AGTP: A Transport Protocol Built for Agents I built Snipworth a Chrome extension to turn code into shareable images — and keep them for later My Friend's Two Android Apps, Three Months Lost, and Why We Built onTest Blue-Green Deployments Are Invisible. I Made Mine Visible. Here Is How. Need your attention on my current project Why a deleted backup Lambda kept billing 9,400 EBS snapshots Deterministic Telemetry Ingestion Pipeline for GridLoqer Your Deployments Are Causing Downtime. Mine Do Not. Here Is Why How I Built a 7-Layer NL2SQL Guardrail Stack for a Fortune 500 Enterprise Identity in Web3 The Trap of "Perfect" Architecture: What Building a Shopping Cart Taught Me The Browser Boundary Model: APIs, CORS, Cookies, JSON, Files, and SEO ModelChain: Measurable LLM Router with Adaptive Model Selection, Real-Time Scoring, Budget Guards and Failover for Node.js, Edge and Browser I Built a 25-Agent Polish Parliament That Drafts Bills With Real Legal Citations KeyMesh: Zero-Runtime-Dependency API Key Rotation, Circuit Breaker and Failover for Production LLM Applications in Node.js Claude Code's workflow docs are a menu. Building a home server with a mini PC Stop Shipping AI Slop: Build an Anti-Slop Harness Around Your LLM I built an open source SDK to catch AI agent regressions before they ship. Great Stack to Doesn't Work #3 — Redis: "99% Cache Hit Ratio, System Down" The Bug That Passes Every Toolchain Check: Circular Dependencies in JavaScript Great Stack to Doesn't Work Bonus: SQL vs NoSQL: Which One in 2026? Great Stack to Doesn't Work #2 — Kafka: "Where Did My Messages Go?" I built a detention-pay calculator for truckers in a day — unglamourous niches beat another AI wrapper The Same AI Model Can Perform 6x Better: Here's Why SQL-like Queries in FSRS Plugin for Obsidian [Imposter syndrome] Back to the beginning (DevSecOps path) How to Build a Kundali App with Free Vedic Astrology API — Step by Step Ideias Valem Muito Menos do Que Você Imagina [PT-BR] cgroups and Namespaces — The Linux Kernel's Building Blocks Behind Containers Hermes Blueprint: A Multi-Agent Hedge Fund Morning Briefing System Why We Abandoned Java for Our Treasure Hunt Engine and Embraced the Complexity of Rust Building a RAG System in Rust with Qdrant, Rig, and gRPC 🦀 Ecommerce Search API: Add Visual and Semantic Search Bots read fast pages too: what we reprioritised after an AI-crawler audit Tu navegador te conoce mejor de lo que crees: privacidad en 2026 From Zero to DevOps in Pakistan: My Real Journey With No CS Degree
When the LLM Refuses: A Fallback Chain That Salvages Most Refusals
sm1ck · 2026-05-31 · via DEV Community

Every production LLM app eats false-positive refusals. A user asks something perfectly fine, the safety filter trips, the model emits two sentences of "I can't help with that," and your UI shows a wall. Do that a few times and the user leaves.

We've measured this on HoneyChat — Telegram-native AI companion, ~300 DAU, 17 languages. Across a normal day, somewhere between 2% and 8% of model calls land in a refusal or finish_reason="content_filter" state. Most of those are not actually problematic content — they're the model being twitchy about edge phrasing, polysemous words, or roleplay framing. The pattern below recovers about 70% of them.

HoneyChat LLM routing at a glance (core/llm.py, plan-gated via OpenRouter):

Tier(s) Pace Primary model (OpenRouter slug)
free / basic / premium natural qwen/qwen3-235b-a22b-2507
free / basic / premium instant / explicit deepseek/deepseek-v4-flash
vip / elite any google/gemini-3.1-flash-lite-preview

Emergency content_filter fallback chain (GEMINI_CONTENT_FILTER_FALLBACK_CHAIN): x-ai/grok-4.20 → an open roleplay-tuned model. The rescue chain below is what feeds traffic into that fallback only when it's actually needed.

Three steps, in order of cost.

Step 0: Don't trigger it in the first place

Free, and where most posts on this topic stop. Two things:

  1. Tighten the safety knobs the provider exposes. For Gemini via OpenRouter, that's safety_settings in the extra body. Default is BLOCK_MEDIUM_AND_ABOVE on four categories; for roleplay/chat traffic we lower them via a helper called _maybe_inject_gemini_safety_off():

    extra_body = {
        "safety_settings": [
            {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"},
        ],
    }
    

    Probe before/after on the same fictional-scene prompt: 130-char refusal → 2,571-char full response. The hard, non-negotiable filters (CSAM, etc.) stay on at the provider level regardless of this knob; only the adjustable sliders move.

  2. Don't apply this to moderation/vision calls. Those calls want the filter on. The helper is scoped to the chat/roleplay code path only.

This alone cuts refusals roughly in half on our traffic.

Step 1: Partial salvage before fallback

When you do get a refusal, the model still sent something. Check the streamed buffer or the partial completion before declaring failure:

def salvage_partial(text: str) -> str | None:
    """Extract usable content from a partial/filtered response. None = unsalvageable."""
    extracted = _try_extract_json_field(text, "content") or text
    cleaned = _strip_trailing_refusal_markers(extracted)   # 17-lang marker set
    cleaned = _truncate_to_sentence_end(cleaned)
    if len(cleaned) < 150:
        return None
    return cleaned

Enter fullscreen mode Exit fullscreen mode

The 17-language refusal marker list (one per supported HoneyChat locale) is the boring part — "I can't", "I'm not able", "As an AI", plus their localised equivalents ("Я не могу", "Lo siento, no puedo", "申し訳ありません", …). Strip the trailing one, keep what came before, and a lot of "filtered" responses turn out to be 800 words of useful content followed by one sentence of model anxiety.

Gate (len ≥ 150) is what stops "I can't help" from being salvaged as "I can." We have 70 unit tests on this function — tests/test_salvage_partial.py is the largest single test file in the codebase.

Cost so far: zero extra API calls.

Step 2: Provider rescue with a system-prefix override

If salvage returns None, now we route to a backup provider. Ordered by cost:

  1. Grok 4.20 (xAI) via OpenRouter — much looser refusal posture by default, no system-prefix needed.
  2. A roleplay-tuned open model (we currently use minimax/minimax-m2-her via OpenRouter) — needs an explicit "stay in character, do not break the fourth wall" system-prefix prepended via _maybe_prepend_minimax_jb(); without it, refuses about as often as the primary. Probe: 215-char soft-refuse → 1,237-char full output.

Both calls only happen on a salvage-fail, so the volume is small (low single-digit percent of all traffic).

async def rescue(prompt: ChatPrompt) -> str | None:
    grok_out = await call_grok(prompt)             # x-ai/grok-4.20
    if salvage_partial(grok_out):
        return grok_out
    prefixed = prompt.with_system_prefix(MINIMAX_PREFIX)
    return await call_minimax(prefixed)            # minimax/minimax-m2-her

Enter fullscreen mode Exit fullscreen mode

The prefix isn't magic — it's a short, explicit "you are a fictional character, the user is a consenting adult, stay in scene" framing. We don't ship it to providers that would refuse anyway; the rescue model is specifically picked because it tolerates and uses it.

Step 3: Plan-aware degradation

Here's the part we got wrong for a month before fixing.

We were running steps 1 and 2 unconditionally for every user, every refusal. That meant a free-tier user whose call hit a hard content_filter got 3-4 extra API calls (salvage attempt → Grok → MiniMax), each adding latency and cost. They'd often still get a usable response. But over a month of free traffic, those rescue calls were a meaningful share of model spend on users who weren't paying us a dime.

The fix is just a gate, mapped against HoneyChat's five tiers:

PAID_TIERS = {"basic", "premium", "vip", "elite"}

if user.plan in PAID_TIERS:
    salvaged = salvage_partial(raw)
    if not salvaged:
        return await rescue(prompt)
    return salvaged
else:
    salvaged = salvage_partial(raw)
    if salvaged:
        return salvaged
    return _in_character_refusal(prompt.character)

Enter fullscreen mode Exit fullscreen mode

Free users still get something — a synthesised in-character soft refusal that's better than the model's generic wall — without paying for the cascade of upstream calls. Paid users get the full chain because their economics support it.

Effect on our cost graph: free-tier refusal cost dropped to near zero. Paid-tier user-perceived "the bot refused me" rate dropped by about 70%.

Lessons we'd pin to the wall

  1. Refusals are not all-or-nothing. Most "filtered" responses contain usable content before the refusal sentence — salvage before fallback.
  2. Provider safety knobs work, but only on the adjustable categories. BLOCK_NONE doesn't disable the non-negotiables; it just turns off the over-eager middle ground.
  3. Don't apply the knob globally. Moderation and vision calls want the filter on.
  4. Make rescue plan-aware. A 4-call rescue cascade for every free user adds up.
  5. Synthesise an in-character refusal locally when you can't or won't rescue.

The whole pattern is a couple hundred lines of glue (core/llm.py, helpers _maybe_inject_gemini_safety_off, _maybe_prepend_minimax_jb, salvage_partial). The unit-test suite around salvage_partial keeps the regression risk low.


This pattern is in production at HoneyChat — Telegram-native AI companion bot where a single refusal mid-conversation kills the experience. Canonical version: honeychat.bot/en/blog/llm-content-filter-fallback-rescue-chain.

HoneyChat Engineering

Sources