慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

云风的 BLOG
云风的 BLOG
Last Week in AI
Last Week in AI
IT之家
IT之家
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - 三生石上(FineUI控件)
Microsoft Azure Blog
Microsoft Azure Blog
Recent Announcements
Recent Announcements
The Register - Security
The Register - Security
C
Cyber Attacks, Cyber Crime and Cyber Security
S
SegmentFault 最新的问题
Engineering at Meta
Engineering at Meta
Know Your Adversary
Know Your Adversary
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
WordPress大学
WordPress大学
C
CXSECURITY Database RSS Feed - CXSecurity.com
F
Fox-IT International blog
C
Cybersecurity and Infrastructure Security Agency CISA
P
Privacy & Cybersecurity Law Blog
雷峰网
雷峰网
大猫的无限游戏
大猫的无限游戏
F
Future of Privacy Forum
阮一峰的网络日志
阮一峰的网络日志
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Recorded Future
Recorded Future
P
Proofpoint News Feed
O
OpenAI News
C
CERT Recently Published Vulnerability Notes
E
Exploit-DB.com RSS Feed
Spread Privacy
Spread Privacy
酷 壳 – CoolShell
酷 壳 – CoolShell
人人都是产品经理
人人都是产品经理
罗磊的独立博客
V
V2EX - 技术
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
T
The Blog of Author Tim Ferriss
N
Netflix TechBlog - Medium
AWS News Blog
AWS News Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
爱范儿
爱范儿
李成银的技术随笔
C
Cisco Blogs
SecWiki News
SecWiki News
Application and Cybersecurity Blog
Application and Cybersecurity Blog
L
LINUX DO - 热门话题
B
Blog RSS Feed
Google DeepMind News
Google DeepMind News
G
Google Developers Blog
Latest news
Latest news
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
J
Java Code Geeks

DEV Community

Audience Builder vs Data Filter: Which Segmentation Tool When? IoT data into D365 Supply Chain: the Azure-native pattern PostgreSQL VACUUM Tuning: A Technical Deep Dive Into Autovacuum Configuration Eval Set Drift: How to Know When Your Golden Set Went Stale Everyone Needs a README for Their Life Kexa.io: Open-Source IT Security for Local AI Governance Per-Customer LLM Cost Reports (Without Rearchitecting Your Billing Pipeline) AI Is Too Expensive? I Run It for Free on My Laptop What are HTTP security headers — and which ones does your site actually need? LLM Trace Storage Cost: Why Your S3 Bill Exploded, and 3 Fixes Multi-Shot vs Zero-Shot: When Adding Examples Actually Hurts Accuracy Computer-Use Agents: 3 Sandboxing Patterns That Don't Leak Credentials When Your Tool Returns Garbage, Agents Loop Forever. Here's the 30-Line Guard. RAG 시스템 실전 구축 (v3) Safeguard AI — a multilingual disaster preparedness assistant powered by Gemma 4 Formeze - Form Handling Without A Server The SFMC Discovery Checklist We Run Before Touching the UI I was manually comparing two versions of a contract for 2 hours before I built this tool Hosting MCP Gateway Registry on AWS ECS: A Practical Blueprint for Enterprise Agentic AI Systems Google Antigravity 2.0: The IDE is Dead, Long Live the Agent Orchestra I built an AI agent that texts me LeetCode and system design summaries every morning, here's exactly how REAL-WORLD ASSETS TOKENIZATION : THE $10 TRILLION EVOLUTION Life is like a FTP Server I Built an "AI Meal Planner." It Almost Produced a Nutritionally Invalid Plan. Generate Claude Code skills from your git history Stop Fighting the DOM. Selector-First Thinking Will Save Your Scraper. Testing OTP email flows shouldn't be flaky — meet AssertKit Firebase AI Logic Is on the Client. Here Are the 4 Security Layers That Keep It Safe. The php-fpm Tuning Cheat Sheet: 5 Settings That Decide Your p99 SFMC Success Metrics That Survive the QBR Goal vs Exit Criteria in Journey Builder: Measure What Matters Why I Wrote 475 Tests for a Desktop Accounting App Stop Engineering Prompts: How an Eval-First Harness Let Us Ship 25 Algorithm Versions Autonomously Deux IA d'accord = une source : la règle qui m'a évité un pipeline bâti sur du vide Two AI reviews agreeing is not two reviews: how I learned to test claims before adopting them My agent could see the dropdown. It just couldn't pick anything. The Job Role Nobody Is Talking About and Why Freshers Should Get There First Why `mixed` Is the Worst Type in Your PHP Codebase (and How to Kill It) PHP Fibers in Production: 4 Real Cases Where They Beat curl_multi and Queues PHP 8.4 Asymmetric Visibility: 5 Patterns That Replace Constructors and Setters apt-mark hold doesn't pin versions — how it nearly removed OpenSSH across our fleet Getting Started with AWS — A Beginner Friendly Introduction I Built a Free Metal Weight Calculator — Here's the Math Behind It From Half Baked Repos to GitHub Glory: How I Am Finishing My Ambitious Ten App Masterpiece Aasa: The Phone That Finally Notices Why Fast Development Fails Without Strong Engineering Foundations Journey Builder vs Automation Studio: Which Tool for Which Job Dynamic Content Blocks: One Email, Different Content Per Tier Everyone's Talking About Gemini 3.5 Flash. The Real Story at Google I/O 2026 Was a Skill File. Enhancing the AI Blog System: SQLite Support and Streamlined Publishing Features
幻象侦测于轨迹层:四探测器今即可部署
Gabriel Anha · 2026-05-24 · via DEV Community

非尽逐幻听于运行时。运行实相核验之模,其费倍增,使p99益高,而耗预算于本无碍之请。然可。 可于事后于迹流中廉取四类幻象。探测器行于汝应用已发之跨段。用户得答于常时延。探测器数秒后识其不善者,而 Slack 之响先于支援票至。

此篇详述四类检测器,于真实大语言模型产品中已获回报:引文根基、置信异常、架构违逆、自洽分歧。每类不过三十至八十行Python代码。继而将之接入OpenTelemetry。SpanProcessor故其行异步于汝应用所发之迹。吾辈乃校准之,盖若不尔,则每者皆妄也。

何谓追迹层检测胜于内联检查

内检之法者:于将大语言模型之应答返诸用户之前,复以他模型量其是否虚妄。NeMo Guardrails、Lynx及多数“评估即服务”之方案皆循此道。其数理非汝所利。

每用户请求数,如一调大语言模型(约四百毫秒)。行内检测,又增一调大语言模型以评据实(约六百毫秒,盖据实模型多缓,尽纳全境)。尔中位时,自四百毫秒增至一千毫秒。九十九分之百时,倍为两秒。符令之费,三倍。尔于此百之百请求数,偿此,然或仅二为幻生之请,实值察之。

踪层检测,行于后。应答既出。用户见答,若本处之速。汝之跨处理器,取成之跨,异步运行探测器,复将旗标书于汝之可察后端,或别立“事端”之列。若探测器发,汝得知,非用户体验之损。

权衡之实,在尔不能。阻之 之应不良。用户已睹之。初闻似恶,较之他择,则不然。遏百脉之流,冀偶得幻象,不若捕八成幻象而歉其余,且致歉焉。凡品,微秒之迟胜于无幻之誓。

侦测一 — 引文根基

RAG系統中最常見之幻覺者:模型引述之來源實不支援其論斷。其表現似是而非(有引述),然所引之文段所言非是,或與論斷全無關涉。

引述之根基,乃於每引述之文段與其所引之句間,作文字/嵌入之檢驗。基本版本無需再行LLM之呼喚。

import re
from dataclasses import dataclass
from sentence_transformers import SentenceTransformer, util

# load once at module level, not per-request
_embedder = SentenceTransformer("BAAI/bge-small-en-v1.5")

@dataclass
class GroundingResult:
    sentence: str
    citation_id: str
    similarity: float
    grounded: bool

CITATION_RE = re.compile(r"\[(\d+)\]")

def check_grounding(
    answer: str,
    sources: dict[str, str],
    threshold: float = 0.55,
) -> list[GroundingResult]:
    """For each sentence with [n] citations, check that the cited
    source actually contains the claim. Returns one row per (sentence,
    citation) pair so you can see which citation in a multi-cite
    sentence is the weak one."""
    results: list[GroundingResult] = []
    sentences = re.split(r"(?<=[.!?])\s+", answer.strip())

    for sentence in sentences:
        cite_ids = CITATION_RE.findall(sentence)
        if not cite_ids:
            continue
        # strip citation markers before embedding — they're noise
        clean = CITATION_RE.sub("", sentence).strip()
        sent_emb = _embedder.encode(clean, convert_to_tensor=True)

        for cid in cite_ids:
            src_text = sources.get(cid, "")
            if not src_text:
                results.append(GroundingResult(clean, cid, 0.0, False))
                continue
            src_emb = _embedder.encode(src_text, convert_to_tensor=True)
            sim = float(util.cos_sim(sent_emb, src_emb).item())
            results.append(
                GroundingResult(clean, cid, sim, sim >= threshold)
            )
    return results

進入全屏模式 退出全屏模式

0.55之阈,非定法也,乃发端耳。校准后可移之。BGE嵌入之于事实申言与佐证文段之余弦相似度,实践间常居0.4与0.85之间。下于0.4者,几为虚构;上于0.7者,近乎有据。其灰色地带,需标记之迹以调(吾辈将及此)。

会意:若所引之文甚长,全嵌之则信号渐弱。所引之论或仅合于二千词源之一节。余弦相似度较之全文,得中庸之数,遂失其契。善策:将源文分句为窗,各嵌之,取诸窗间似之数。

检测二 — 自信之异,源于对数概率

当模型虚构其未尝所见之实,其日志概率分布常显怪异。自信而谬者,其患尤甚,然自信而谬者,有异态现焉。平地每词熵。此模态固守其素常所忌之词。

汝需于应答中得logprobs。OpenAI以之示人。logprobs=True, top_logprobs=5:Anthropic之检测,吾所查,未尝有之,故此器仅适用于显其形者。

import math
from statistics import mean

def token_entropy(top_logprobs: list[dict]) -> float:
    """Shannon entropy over the top-k token distribution at one
    position. High entropy = model is unsure. Low entropy = model
    is committed."""
    probs = [math.exp(t["logprob"]) for t in top_logprobs]
    total = sum(probs)
    if total == 0:
        return 0.0
    norm = [p / total for p in probs]
    return -sum(p * math.log(p) for p in norm if p > 0)

def confidence_anomaly_score(
    tokens: list[dict],
    baseline_mean_entropy: float,
    baseline_stdev: float,
) -> float:
    """Z-score of this response's mean token entropy against
    the baseline you computed on labelled good traces. Returns
    abs(z). Above 2.5 is worth flagging."""
    if not tokens:
        return 0.0
    per_token = [
        token_entropy(t.get("top_logprobs", []))
        for t in tokens
        if t.get("top_logprobs")
    ]
    if not per_token:
        return 0.0
    response_entropy = mean(per_token)
    if baseline_stdev == 0:
        return 0.0
    return abs(response_entropy - baseline_mean_entropy) / baseline_stdev

Enter fullscreen mode Exit fullscreen mode

:尔当每周自标好之迹中,算baseline_mean_entropybaseline_stdev。藏于Redis或尔管流所载之YAML文。则每迹之得,惟一z分耳。

信噪较引证为甚。Z值2.5者,谓"此应之平均词信度,较基准偏2.5准差。"此法既捕自信之幻象,亦捕合宜而异常之应(如问"2+2"之答,较寻常为定)。当以此器为和断之器,或为诸器之滤,非为警始之器。

探测器三 — 框架与格式违逆

若汝之提示索求具形之 JSON,凡形不协者,乃意念之幻生。纵内容无谬,框架之破,示模型未守契约,下游之码或崩或默弃其域。

此者四者之中价最廉、误报率最低,必当选用之。

import json
from jsonschema import Draft202012Validator, ValidationError

@dataclass
class SchemaResult:
    valid: bool
    errors: list[str]
    parse_failed: bool

def check_schema(raw_output: str, schema: dict) -> SchemaResult:
    """Validate the model's raw text against a JSON Schema. Both
    parse-fail and validation-fail count as hallucinations of
    intent — the model didn't follow the contract."""
    try:
        parsed = json.loads(raw_output)
    except json.JSONDecodeError as e:
        return SchemaResult(False, [f"json parse: {e}"], True)

    validator = Draft202012Validator(schema)
    errors = [
        f"{'.'.join(str(p) for p in e.absolute_path) or '<root>'}: "
        f"{e.message}"
        for e in validator.iter_errors(parsed)
    ]
    return SchemaResult(len(errors) == 0, errors, False)

入全景模式 出全景模式

若在OpenAI上,当与之配对response_format={"type": "json_schema", ...}。此于生成时防备模式违犯。然实践中,结构化输出仍偶失可选字段,或于长流中截断时幻生枚举值。探测器可察之。

寻常之态:工器呼唤之使,发焉。{"tool": "search_docs", "args": {"q": "..."}}此模态渐显娇憨,且发其声。{"tool": "search_documents", "args": {"q": "..."}}JSON有效,然架构拒之。tool非枚举之列,汝之工具调度器默然返空,用户得幻象之应,无所得而还。模式检视发火,汝可见之。

检测器四——自洽性分歧

同答N次,温度>0。若诸答相异逾阈,则模型妄作。若会归,则模型笃于其答(虽不能证其确,然可证非如掷币)。

此耗N×符,故勿遍施于每迹。当以之姿行之。探查器。择“高风险”属性(医、财、法问)之迹百分之一,施以一致性校验。

import asyncio
from openai import AsyncOpenAI

_client = AsyncOpenAI()

async def _sample_once(messages: list[dict], model: str) -> str:
    resp = await _client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.7,
        max_tokens=400,
    )
    return resp.choices[0].message.content or ""

async def consistency_score(
    messages: list[dict],
    n: int = 4,
    model: str = "gpt-4o-mini",
) -> float:
    """Sample N times, embed each sample, return mean pairwise
    cosine similarity. 1.0 = identical, <0.65 = divergent."""
    samples = await asyncio.gather(
        *(_sample_once(messages, model) for _ in range(n))
    )
    embs = _embedder.encode(samples, convert_to_tensor=True)
    sims: list[float] = []
    for i in range(n):
        for j in range(i + 1, n):
            sims.append(float(util.cos_sim(embs[i], embs[j]).item()))
    return sum(sims) / len(sims) if sims else 0.0

入全景模式 出全屏模式

了然:自洽之法可察虚妄于事理之问然遗珠之幻象。二者皆误,且以同法陈妄言(因其同具前因)。适于“X事何年发”之问,拙于“撮此文书要旨”之询,盖以谬而一致为失也.

将探测器接于OTel跨度处理器之线

追踪层检测之要义,在于其运行于应用已发出之跨度上。最为洁净之法,乃定制之SpanProcessor每LLM段落毕即发之,于背景线程运行探测器,复将结果加为段落属性。

import json
import logging
from concurrent.futures import ThreadPoolExecutor
from opentelemetry.sdk.trace import SpanProcessor, ReadableSpan
from opentelemetry import trace

log = logging.getLogger("hallucination-detector")
_pool = ThreadPoolExecutor(max_workers=8)

class HallucinationSpanProcessor(SpanProcessor):
    def __init__(self, schema_registry: dict, baseline: dict):
        self.schema_registry = schema_registry  # operation -> schema
        self.baseline = baseline  # {"mean": float, "stdev": float}

    def on_start(self, span, parent_context=None):
        pass

    def on_end(self, span: ReadableSpan):
        # only run on LLM spans — convention: name starts with "llm."
        if not span.name.startswith("llm."):
            return
        _pool.submit(self._run_detectors, span)

    def _run_detectors(self, span: ReadableSpan):
        try:
            attrs = dict(span.attributes or {})
            answer = attrs.get("llm.response.content", "")
            sources_json = attrs.get("rag.sources_json", "{}")
            tokens_json = attrs.get("llm.response.tokens_json", "[]")
            operation = attrs.get("llm.operation", "")
            sources = json.loads(sources_json)
            tokens = json.loads(tokens_json)

            findings: dict[str, object] = {}

            if sources:
                g = check_grounding(answer, sources)
                ungrounded = [r for r in g if not r.grounded]
                findings["grounding.ungrounded_count"] = len(ungrounded)
                findings["grounding.min_sim"] = (
                    min((r.similarity for r in g), default=1.0)
                )

            if tokens:
                z = confidence_anomaly_score(
                    tokens,
                    self.baseline["mean"],
                    self.baseline["stdev"],
                )
                findings["confidence.zscore"] = z

            schema = self.schema_registry.get(operation)
            if schema:
                s = check_schema(answer, schema)
                findings["schema.valid"] = s.valid
                if not s.valid:
                    findings["schema.errors"] = "; ".join(s.errors[:3])

            self._publish(span, findings)
        except Exception:
            log.exception("detector failed for span %s", span.name)

    def _publish(self, span: ReadableSpan, findings: dict):
        # write back as a follow-up span — you can't mutate a
        # finished span, but you CAN emit a sibling span with the
        # same trace_id and the parent's span_id.
        tracer = trace.get_tracer("hallucination-detector")
        ctx = trace.set_span_in_context(
            trace.NonRecordingSpan(span.get_span_context())
        )
        with tracer.start_as_current_span(
            "llm.detector.result", context=ctx
        ) as result_span:
            for k, v in findings.items():
                result_span.set_attribute(k, v)
            # fire your alerting hook here if thresholds are crossed
            if findings.get("grounding.ungrounded_count", 0) > 0:
                result_span.set_attribute("alert.fired", True)

    def shutdown(self):
        _pool.shutdown(wait=True)

    def force_flush(self, timeout_millis: int = 30_000):
        return True

入全景模式 出全屏模式

其妙在于_publish之步,不可变焉。on_end既呼,则段唯读不可改。故尔发段,同其踪迹之ID,附以检测之果。尔之后台(Honeycomb、Grafana Tempo、Jaeger、Langfuse)将示此段于原LLM段之侧。询alert.fired = true,得幻听之仪表。

请于汝之处理器上注册TracerProvider启程之际:

from opentelemetry.sdk.trace import TracerProvider

provider = TracerProvider()
provider.add_span_processor(
    HallucinationSpanProcessor(
        schema_registry={"answer_with_citations": ANSWER_SCHEMA},
        baseline={"mean": 1.42, "stdev": 0.38},
    )
)
trace.set_tracer_provider(provider)

入全景模式 出全屏模式

巧者之得:校准于标记之迹,否则每警皆为虚鸣

此间每器皆有一旋钮:阈限、Z分位数之裁、最小相似度。以默认值发之,则或自扰无已,或永无所得。二者皆无益也。

校准之环虽简,然必为之。

  1. 近两月所产之样本,计二百至五百缕。
  2. 人(汝或标注者)各分类之cleanhallucinated请提供需要翻译的英文文本。
  3. 四探测器悉施于诸标记之迹。
  4. 每探测器,遍历阈值,于各设定处计算精准率/召回率。
  5. 择阈限,以合汝警觉所能容之精微(常取七分之七,盖可容三警中一误,未至倦怠耳)。
import csv
from dataclasses import dataclass

@dataclass
class LabelledTrace:
    trace_id: str
    is_hallucination: bool
    grounding_min_sim: float

def calibrate_grounding(
    traces: list[LabelledTrace],
    candidate_thresholds: list[float],
) -> list[tuple[float, float, float]]:
    """For each threshold, return (threshold, precision, recall) for
    treating min_sim < threshold as a hallucination flag."""
    rows = []
    for t in candidate_thresholds:
        tp = sum(
            1 for x in traces
            if x.grounding_min_sim < t and x.is_hallucination
        )
        fp = sum(
            1 for x in traces
            if x.grounding_min_sim < t and not x.is_hallucination
        )
        fn = sum(
            1 for x in traces
            if x.grounding_min_sim >= t and x.is_hallucination
        )
        precision = tp / (tp + fp) if (tp + fp) else 0.0
        recall = tp / (tp + fn) if (tp + fn) else 0.0
        rows.append((t, precision, recall))
    return rows

入全幅视界。 出全幅视界。

将此施诸于尔之标记集,察其精准/召回曲线,择其转折。乃将所选阈值,存于版本化之配置文件(即尔之探测器于启动时读取者)。每易模型版本,当复校准,盖基线亦移矣。

误报率,诸探测器相乘。若尔之四探测器,各具五分之误报率,尔当警于任何。者发,尔之微迹伪阳性率约十八。或唯警于二器以上交会,或权其轻重:引文根抵与式范检核发为警报,信度异常及一致性离析发为 Slack 之讯,无人应之。

诸器价廉。校准之术,方使全流有用,非徒增噪耳。

尔之栈中,首当运何器?何故未遽运之?


若此有益

幻觉之辨,乃大之迹层评估之一隅。大语言模型可察小册:择适之迹及评估之器于尔之众 之文,述其行迹,辨诸器之成,详校准之序,析 Langfuse、Arize Phoenix、Honeycomb 与自筑于 OTel 之权衡。所谓“在线评估”之章,与斯文最契。

LLM Observability Pocket Guide: Picking the Right Tracing & Evals Tools for Your Team