惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

SecWiki News
SecWiki News
S
Secure Thoughts
N
News and Events Feed by Topic
NISL@THU
NISL@THU
WordPress大学
WordPress大学
H
Hacker News: Front Page
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
GbyAI
GbyAI
Scott Helme
Scott Helme
Hacker News: Ask HN
Hacker News: Ask HN
S
Security @ Cisco Blogs
J
Java Code Geeks
T
The Blog of Author Tim Ferriss
Attack and Defense Labs
Attack and Defense Labs
The Register - Security
The Register - Security
Y
Y Combinator Blog
Latest news
Latest news
小众软件
小众软件
Know Your Adversary
Know Your Adversary
P
Proofpoint News Feed
P
Palo Alto Networks Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
H
Help Net Security
C
Comments on: Blog
The GitHub Blog
The GitHub Blog
T
Tailwind CSS Blog
博客园 - 聂微东
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
MongoDB | Blog
MongoDB | Blog
宝玉的分享
宝玉的分享
Google DeepMind News
Google DeepMind News
C
CERT Recently Published Vulnerability Notes
V
Visual Studio Blog
M
MIT News - Artificial intelligence
F
Full Disclosure
T
Tor Project blog
F
Fortinet All Blogs
B
Blog RSS Feed
博客园 - 三生石上(FineUI控件)
A
Arctic Wolf
量子位
Last Week in AI
Last Week in AI
www.infosecurity-magazine.com
www.infosecurity-magazine.com
博客园_首页
T
The Exploit Database - CXSecurity.com
P
Proofpoint News Feed
酷 壳 – CoolShell
酷 壳 – CoolShell
The Hacker News
The Hacker News
G
Google Developers Blog

DEV Community

Why I Can't Stop Thinking About Google's New A2A Protocol Centralized procurement D365: global address book + vendors Perovskite cell scaps simulation analysis ¿Qué significan esas letras del CVSS? Guía para entenderlo de una vez scrcpy Integration in a Tauri App — Android Screen Mirroring on Mac Shopify theme editor: design tokens merchants can edit Dataverse security restructure: lessons applied too late Floatkit is live now!!! SimGemma: Democratizing STEM Education with Offline-First AI Simulations What to monitor in an AI agent before you launch (and after) The precedence rule deserves a name Diffusion Language Models Are Here: Deep Dive into NVIDIA's Nemotron-Labs DLM Architecture [Boost] I Still Remember the Day Our Server Stall Almost Killed the Product Launch AI Agents Need More Than Fact-Checking Evaluation & Benchmark Results 5 things `flutter_gemma` doesn't tell you about shipping Gemma 4 on Android How I Indexed 2,000 Claude Code Skills (And What the Install Data Says About AI Coding in 2026) Architecting Instant Micro-Loans: Data Pipelines and KYC Automation Bulk Rename Files from the Command Line with Python Virtual SOC Analyst This project was an absolute blast to build for the Hermes Agent Challenge. If you found the architecture layout or the local automation breakdown helpful, please drop a ❤️ or a 🦄 on the post! Let me know if you want me to write a follow-up guide specifi How I built a fully offline AI assistant on Android with Gemma 4 E2B How I Got Users to Willingly Wait 1 Minute for an API Call (Without Over-Engineering) What Training Exists for Security Professionals Learning AI and Data Science? Easier Bets to Get Early Customer Validation and VC Attention django-deploy-probes — deployment probe endpoints for Django AI Won’t Replace Developers. Weak Thinking Will. Building Micro Agents as Production-Grade Microservices Why Open-Weight Models Like Gemma 4 Are the Future of Secure Backend Architecture I lost 3 enterprise clients in one night because of a GitHub repo. So I built a tool to make sure it never happens again. Building a Local AI SOC Analyst on an M1 MacBook Pro Carelo: A Modern Dual-Pane File Manager for Linux AI API Pricing in 2026: What You Actually Pay for GPT-5.5, Claude Opus, Gemini, and 20+ Models I Built a Free Offline-First Event Operations Platform at 13. Here's Why the Architecture Is Different. I Built an AI Tools Directory. These 10 Lessons Hurt the Most. The "Disappearing Zero": Handling Numeric Inputs in React Native Forms I Finished My Local AI Coding Agent After 5 Months — Eve Agent V2 Unleashed published Neuropsychology: What Brain Damage Reveals About the Mind Shipping Gemma 4 speech recognition in a Windows .NET desktop app: a 5-variant model-selection tour Engineers Don’t Fail Technical Interviews Because They’re Bad at Tech — They Fail Because They Ignore Communication The 20% of ML theory that earns its keep in production WeiQi - (Go) game based productivity tool Diário de dev #1: o que 15 minutos desbloqueou 远程安装及部署应用 · 用户配合指南 The Complete Guide to API Design in 2026: REST, GraphQL, and tRPC in Production 🐍 Flask Python Structured Logging — What Most Miss in Production CSS in 2026: Container Queries, Cascade Layers, and the End of Utility-Class Bloat TypeScript 5.5 — The Features That Actually Matter for Production Code Database Migration Strategies That Actually Work in Production Detecting unusual processes on your servers without writing a single rule 2026 Q1 is the year developers still build the agent harness. 2026 Q3 / 2027 is the year the LLM builds its own harness. Introduction to Generative AI no-cycle finds 0 cycles in next.js (and other lies caches tell you) Google I/O 2026 Wasn’t About AI Models — It Was About Infrastructure Hermes Agent vs Openclaw بناء موقع شخصي يمثلك كمطور: دروس من رحلتي Building a Developer Portfolio That Represents You: Lessons from My Journey Your Checkout Is Probably Leaking Revenue. The Problem Is You Cannot See Where. Domain-Based C++ Logging With Nova OpenCode Go + Oh My OpenAgent: The Model Routing Config That Actually Saves Money Seven Types of Data Extensions We Use on SFMC Projects Rollup vs calculated columns in Dataverse: the async trap we fell for MES integration with D365 Supply Chain: Azure middleware pattern Custom API vs Custom Action vs Azure Function: Dataverse decision Cutting agent latency from 30s to 8s without model swap When recall plateaus: the late-interaction technique most teams skip Mobile stack decision: FlutterFlow vs React Native vs Flutter Plugin + Azure Function + Service Bus: async integration at scale SFMC Data Model and Cardinality: Wire DEs Together Without Regret Custom connector with OAuth2: three auth pitfalls we debugged Four forensics when a production AI agent fails Hiring engineers in the age of AI Go Unit Testing: Structure & Best Practices The cognitive bottleneck: rethinking velocity for AI-assisted development GitHub Bounty 赏金接单全攻略:从0到第一桶金 I Built a Mix Translation Tool in a Single HTML File LIKAS: An offline disaster companion for the Philippines, powered by on-device Gemma 4 E2B Being Seen — The World of Aying (7/12) OpenClaw vs Hermes Agent: Similarities, Differences, and Where Each Shines Your Vercel Redirect Is Backwards and Google Is Ignoring Your Site When a 200-Line CPQ Quote Takes 30 Seconds: Where to Look First SOQL Selectivity: Avoiding Full Table Scans on Million-Row Objects Building a Mini Tailwind-to-CSS Converter — How Utility Class Names Map to Real CSS Piclu - Turning voice notes into a shopping list with local Gemma 4 Building High-Converting E-Commerce Stores for Niche Products: A Developer's Guide Monolith vs Modular Monolith vs Microservices: The Honest Decision Framework What Developers Don’t Say in Interviews—but Show on GitHub DeepSeek V4-Pro Just Got 4x Cheaper. But Here's What Nobody's Talking About How I Built a Full-Stack Roulette Game with Claude AI and Deployed It to AWS — While Learning Everything Along the Way OpenClaw on GCP: A Secure Multi-Tenant AI Agent Platform with MicroVM Isolation My CI/CD Architecture Is Learning to Code Still Worth It in the AI Era? Building 2048 in the Browser — slide+merge Mechanics and CSS-Only Animation Anatomy of the Slopster Stop Asking Gemma 4 to Just Summarize Java Records Deserve a Mapper Built for Them React.js ~use() hook for Caching Problem~ Web security headers + HSTS + CSP Web performance beyond Core Web Vitals
Faithfulness gate: the agent layer most teams skip
SapotaCorp · 2026-05-24 · via DEV Community

A B2B SaaS team got an angry email from a customer last quarter. The customer's account team had asked the company's AI assistant whether their plan included SSO. The assistant said yes. The customer's IT team spent two days trying to configure it, escalated to support, and discovered the assistant had been wrong. SSO was on the Enterprise tier. The customer was on Pro.

The assistant had searched the documentation, found nothing definitive about which tiers included SSO, and produced a fluent answer based on what seemed plausible from training data. The user had no way to know it was a hallucination.

The fix was not "a better model." A larger LLM would have hallucinated more confidently with the same insufficient context. The fix was a layer that should have been there from day one: a faithfulness gate that checks whether the agent's response is actually grounded in the retrieved context before shipping it to the user.

This is one of the highest-leverage interventions for production AI agents. Most teams skip it because the failure mode is invisible until a customer complains.

What faithfulness actually measures

Faithfulness is a single question: does the agent's response make claims that are supported by the context the agent retrieved?

If the agent searched the KB and found "Pro tier includes basic features X, Y, Z. Enterprise tier includes X, Y, Z plus advanced features A, B, C, including SSO," then a response saying "your Pro plan includes SSO" is unfaithful. The retrieved context does not support that claim.

This is different from "is the response correct." Correctness requires ground truth. Faithfulness only requires the retrieved context. You can check it without a human in the loop.

The mechanic: extract atomic claims from the response, check each claim against the retrieved context, return a score. Below threshold, the response is unfaithful and should not be shipped.

How the gate actually works

The pattern is straightforward:

  1. Agent generates a response based on retrieved context
  2. A separate LLM call (the "judge") extracts the atomic claims from the response
  3. For each claim, the judge checks whether the retrieved context supports it
  4. The faithfulness score is the fraction of claims supported
  5. If the score is below threshold (we default to 0.85), the response is rejected
  6. The agent either retries with revised context or returns "I cannot answer this confidently from available information"

Frameworks like Ragas implement this directly. You can also build it yourself with a single LLM call using a structured prompt. The judge model does not need to be the production model. We typically use GPT-4o-mini or Claude Haiku for the judge to keep costs low; they are accurate enough for this task.

Why this catches what model size does not

Bigger models are not less likely to hallucinate. They are more confident hallucinators. Given the same insufficient context, GPT-4o will produce a better-written, more structured, more authoritative-sounding wrong answer than GPT-3.5 ever could.

The faithfulness gate works at a different layer than the model. It does not care how confident the model sounds. It only cares whether the claims in the response can be traced back to the retrieved context.

In the team's audit, faithfulness gates caught about 40% of the responses that customers had previously reported as wrong. Most of those would not have been caught by switching to a more expensive model.

The threshold question

Where to set the faithfulness threshold is a product decision, not a technical one.

  • 0.95 and above: very strict. Use for legal advice, medical information, financial recommendations, regulatory compliance. The cost is more "I cannot answer" responses, which is the right cost for high-stakes domains.
  • 0.85 to 0.95: production default for B2B SaaS. Catches most confident hallucinations without rejecting legitimate responses that have minor unsupported flourishes.
  • 0.70 to 0.85: more permissive. Use for internal tools where users can self-verify, or for early-stage products where rejecting too many responses kills the UX.
  • Below 0.70: effectively disabled. Not recommended for customer-facing.

The team we worked with was in B2B SaaS. We set the threshold at 0.88 initially, monitored the rejection rate (about 6% of responses), and tuned to 0.85 after a week when the rejection rate felt too aggressive for the user experience.

What to do when the gate fails

The agent has three options when a response fails the faithfulness check:

Retry with augmented context. The agent searches again with a query informed by the failure. Sometimes the original retrieval was insufficient and a second pass surfaces the missing context. Retry once, max twice. Beyond that, do not loop.

Return "I cannot answer this confidently." Honest about the limitation. Surfaces a real product problem (insufficient documentation, ambiguous query) that the team can address. Better than a confident wrong answer.

Escalate to human handoff. The agent surfaces the question to a human support agent, with the retrieved context attached. Useful for customer-facing systems where "I don't know" is not an acceptable terminal state.

Production teams ship all three. Retry first (cheap, often resolves), fallback to honest "I don't know" (acceptable for low-stakes), escalate for high-stakes or repeat questions.

What we shipped for the team

The original system was a customer support agent with RAG over the documentation. We added:

  1. Faithfulness check on every response, using GPT-4o-mini as the judge model.
  2. Threshold of 0.85 for production responses.
  3. Retry once with augmented retrieval if the first response failed the check.
  4. Honest fallback ("I cannot find that specific information in our documentation. Would you like me to escalate to a human agent?") for responses that failed twice.
  5. Logging of every failed faithfulness check, so the team can review patterns and improve documentation coverage.

Customer-reported wrong answers dropped 60% in the first month. The faithfulness gate did not improve correctness in the abstract; it just stopped the system from confidently shipping wrong answers to customers. The honest "I don't know" responses were initially worried about (would users be unhappy?) but turned out to be received well. Users prefer "I don't know" to wrong answers, even when they think they want fast answers.

The unexpected benefit was the failed-check log. The team now had a list of every question the documentation could not confidently answer. That became the documentation backlog. Six months in, customer-reported issues had dropped 80% from the pre-gate baseline, partly from the gate and partly from the documentation improvements the gate surfaced.

When the gate is not enough

A faithfulness gate prevents one specific failure mode: claims unsupported by retrieved context. It does not catch:

  • Wrong context retrieved. If the RAG pipeline pulled the wrong document, the response will be faithful to the wrong source. Need eval for this.
  • Outdated context. Faithful to documentation that was correct six months ago and is now stale. Need versioning and freshness tracking.
  • Subtly wrong reasoning. Claims supported by context but the inference between them is invalid. Need stronger evaluation, possibly human review.

The gate is necessary but not sufficient for production reliability. It is the highest-leverage single intervention, but it is not the only intervention.

The Sapota recommendation

For production agents that handle factual queries (customer support, internal knowledge, compliance, anything where being wrong has cost):

  • Add a faithfulness gate on the response path
  • Use a cheap judge model (GPT-4o-mini, Haiku) to keep costs low
  • Set threshold at 0.85 to start, tune based on rejection rate
  • Implement retry-once and honest-fallback policies
  • Log every failure for documentation improvement

The infrastructure cost is roughly $0.001 per response. The reduction in customer-reported errors is typically 40 to 60% in the first month.

This is not optional for production B2B agents. It is the layer that turns a demo into a product.

If your agent has been confidently wrong

If your team has had customers report incorrect answers from your AI assistant, and "we'll switch to a better model" has not fixed it, the missing layer is almost certainly faithfulness checking.

Sapota offers a one-week implementation engagement that adds faithfulness checking to your existing agent, calibrates the threshold against your historical reports, and ships the retry and fallback logic as a working PR. We have done this for customer support agents, internal knowledge bases, and compliance tools.

Reach out via the AI engineering page with a few examples of incorrect responses your agent has given. The diagnostic conversation usually surfaces both the faithfulness gap and the documentation gaps that the gate will help expose.