慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
大衍之数,存乎其用。今之LLM,其踪迹所留,资费何其巨耶?此何以故?有三策可解之。
Gabriel Anha · 2026-05-24 · via DEV Community

此讯息至星期二。财会部云:"上季S3支出增三倍。何故?"工部云:"无有。"二皆确也。两月前,有者增LLM追迹(入则示问,出则示答,每段皆载全负)。未设存留之策。此桶日增14GB,继而22,继而31。

此非特例,乃无负载策略之LLM追踪管路之常形。佳讯:有三法可善,依部署之易难为序,各法皆可减费,而不损汝实用以追踪之作业流程。

字节所往

一常之OTel段甚微。踪迹ID、段ID、其父、属性、事件、状态。或二KB,若饰以HTTP法、状态码、用户ID、地域。此段汝APM藏十载,其价汝已不复视。

大语言模型之跨度非常。其载提示:系统消息,全聊天历史,检索上下文,工具定义,响应架构。继而响应。有时若启推理追踪,则附推理轨迹。长代理单跨度八十千字节。四十七转代理运行达四兆字节。每秒二百请求,负载字节较跨度元数据五十比一。

故观S3之桶日增三十GB,非量之骤增也,乃文也。此文汝所书于迹,盖因SDK言"设gen_ai.prompt",汝从之也。

初念欲减采,此非正念。正念者,当思何载荷堪其储。

修一:成则采,败则全存

生产行李流有二类。其九九者,皆成而与近万成迹无异。其一一者,或败、或误、或超时、或返妄言、或为评所察、或生怨。存此一类,乃有迹之由。存彼全类,乃费之实。

神明之常则,必力采其成,而存其败。尾随之取,使此易为,盖决断在段终之后。届时可知其谬否,是否触幻觉之鉴,迟滞是否逾限。

def should_keep_payload(span):
    if span.status.status_code == StatusCode.ERROR:
        return True
    if span.attributes.get("llm.eval.flagged"):
        return True
    if span.attributes.get("llm.latency_ms", 0) > 5000:
        return True
    # success path: keep 1 in 50
    return random.random() < 0.02

入全屏模式 出全屏模式

是矣。汝犹泄其涵元之识,计其符文之数,度其迟滞之期,彰其模名,故仪表与费帑之报犹准。汝但遗其诰令与应答之芟于九成之庸务耳。

此一变,常减其载物之储以九成有奇。若于此帖无所为,则为之是也。

修二:阶序之存

其二,用之不勤。热、温、冷。三桶,三则生息之律,三度价码.

近迹者,工师所启也。二十四时之内,必矣。七日之内,常矣。过此,则取用之式隳。或于追查之际,一月一引三十日之迹。以S3 Standard之价,偿此流,犹演剧耳.

# s3 lifecycle rule
LifecycleConfiguration:
  Rules:
    - Id: llm-traces-tiering
      Status: Enabled
      Prefix: traces/
      Transitions:
        - Days: 7
          StorageClass: STANDARD_IA
        - Days: 30
          StorageClass: GLACIER_IR
        - Days: 180
          StorageClass: DEEP_ARCHIVE
      Expiration:
        Days: 730

入全景模式 退出全屏模式

S3标准存储约每GB月0.023元。标准-IA降至约0.0125元。冰川即时检索约0.004元。深度归档最低约0.00099元。生命周期转换每千个对象耗费数分钱,但在繁忙的存储桶中,一日内即可收回成本。

一得之见,当揭之:标准IA每对象最低计费为128KB。若小负载下,每对象仅一跨度,则虽对象仅4KB,亦需付128KB之费。当将追踪写入批处理(每追踪流每分钟一对象,或按追踪ID归并),使每对象至少数百KB。弃此步者,终致IA账单似标准账单,且撰怒文,言层级制不效。

修正三:载负截断与再水化标记之处理

第三修正针对长尾之症。四兆字节之代理文本乃异端,毁平均之数。汝不欲弃之(调试代理循环之工程师需之),然亦不欲其内嵌于热迹存储之段中。

其法:于段中自截负载,将全版书于物存之府,以内容为键,而存其键为属。迹之界面显其截于前,复供点而复苏之钮.

def truncate_with_token(payload: str, span, max_inline: int = 2048):
    if len(payload) <= max_inline:
        return payload
    digest = hashlib.sha256(payload.encode()).hexdigest()
    key = f"traces/payloads/{digest[:2]}/{digest}.txt"
    s3.put_object(Bucket=PAYLOAD_BUCKET, Key=key, Body=payload)
    span.set_attribute("llm.payload.s3_key", key)
    span.set_attribute("llm.payload.full_bytes", len(payload))
    return payload[:max_inline] + f"\n…[truncated, rehydrate: {digest[:12]}]"

入全景式 出全景式

内容寻址之钥,使相同之负载(系统之提示、常用工具之定义、重复用户之查询)得免费去重。于真实代理之工作负载,此乃另增40-60%之存储节省,盖系统之提示于每一跨度皆同,汝可止付存储相同之8 KB区块百万次之资费。

四十行之OTel SpanProcessor,能成三者

此乃发行之版本。其为BatchSpanProcessor 之包,行采样、截断、删节、再水化令符之改写,俟段至导出者。置诸汝所用之导出器前(Tempo、Honeycomb之OTLP端点、S3所支之管)。

import random, hashlib, re
from opentelemetry.sdk.trace import SpanProcessor
from opentelemetry.trace import StatusCode

PII = [
    (re.compile(r"\b[\w.+-]+@[\w-]+\.[\w.-]+\b"), "[email]"),
    (re.compile(r"\b\d{3}-\d{2}-\d{4}\b"),        "[ssn]"),
    (re.compile(r"\b(?:\d[ -]*?){13,16}\b"),      "[card]"),
]
PAYLOAD_ATTRS = ("gen_ai.prompt", "gen_ai.response", "llm.input", "llm.output")

class LLMPayloadProcessor(SpanProcessor):
    def __init__(self, downstream, s3, bucket, inline_limit=2048):
        self.downstream, self.s3, self.bucket = downstream, s3, bucket
        self.inline_limit = inline_limit

    def _keep_full(self, span) -> bool:
        if span.status.status_code == StatusCode.ERROR: return True
        if span.attributes.get("llm.eval.flagged"):     return True
        if span.attributes.get("llm.latency_ms", 0) > 5000: return True
        return random.random() < 0.02

    def _redact(self, text: str) -> str:
        for pattern, repl in PII:
            text = pattern.sub(repl, text)
        return text

    def on_end(self, span):
        keep = self._keep_full(span)
        for key in PAYLOAD_ATTRS:
            raw = span.attributes.get(key)
            if not raw: continue
            clean = self._redact(raw)              # PII out before anything else
            if not keep:
                span._attributes[key] = "[sampled-out]"
                continue
            if len(clean) > self.inline_limit:
                digest = hashlib.sha256(clean.encode()).hexdigest()
                obj_key = f"traces/payloads/{digest[:2]}/{digest}.txt"
                self.s3.put_object(Bucket=self.bucket, Key=obj_key, Body=clean)
                span._attributes[key] = clean[:self.inline_limit] + f"\n…[rehydrate:{digest[:12]}]"
                span._attributes[f"{key}.s3_key"] = obj_key
            else:
                span._attributes[key] = clean
        self.downstream.on_end(span)

    def shutdown(self): self.downstream.shutdown()
    def force_flush(self, timeout_millis=30000):
        return self.downstream.force_flush(timeout_millis)

于此般连之:

tracer_provider.add_span_processor(
    LLMPayloadProcessor(
        downstream=BatchSpanProcessor(OTLPSpanExporter()),
        s3=boto3.client("s3"),
        bucket="acme-llm-payloads",
    )
)

上文中非偶然之数事。个人识别信息之遮蔽,先于采样之检,故纵有废弃之负载,亦已清理,以防下游之日志得之。内容寻址之S3密钥,予汝免费之去重。s3_key之属性,乃追蹤之界面用以復原者也,汝可於簽名之URL後書微Lambda以供之。採樣之閾值,可依環境調整。測試之處錯誤率為三十,故「永存錯誤」之規則不至掩埋汝於此處。

span._attributes之變遷,乃一粗棘。OTel之公開API視追蹤屬性為啟始後不可變,然BatchSpanProcessor之行運on_end 乃在工之线程,其跨度已不复书。然此实无碍。若欲严正,当裹此跨度,复以自造之导出器而发之。

其所慎者:于存储前隐私,非于取用之时。

最噬团队之本能者,莫过于“存原载,阅后删”。似合情理。留原稿以备不时之需,呈删节本予非关人员之工程师,而后去之。

及 GDPR 之至,或 SOC2,或顾客之 DSR。今当有可稽之答于“私数据何以存储,孰得而阅,如何删除?”若答“阅即删之”,则原始之 PII 永驻 S3。此乃监管者所重之存储之事,非展示之事也。

于导出之前,当于截取处理器中删削之。至 S3 之截断负载,其内已当具邮件、社安号、卡号、电话号,及任何特定域之标识符(如客户号、内部账户号)皆当以符号易之。为罕见之案件,若需原始负载以应事故,当设别途,此途当受控存取、休眠时加密。使此途非默认之状态,当依请求择取之。

上处理器之删节,乃至简耳。若供职于规管之业,须添租户独用之规层。且每月以删节者,对自评之集运行之,盖因一旦有人增新实体类而忘更正则式,则裸卡号复现于汝之迹库之日也。

此法所获

共成之(例成则继,层存而截,复活则删,存前而隐)昔费九千金月于S3之务,今落四百至九百之域。调试无碍,盖谬误滞速,犹存全载数。合规之姿益善,盖私要非存于物储,待窥也。工师不察迹之变,惟长踪现“载全录”之钮耳。

众人未尝告汝,当运 LLM 追踪之时,负荷之经者,乃设计之决也。跨度无价,而诘问与应答非也。

汝所睹最恶 LLM 观察之费,孰若此三者之补,能察之?投诸注议。


若此有益

此篇中所言权衡(采样之形、留存之级、负载之理、抹删之位在流程中何属)恰合吾之LLM可察小册所论。 往来。章论踪迹管路之设,述上所载之 SpanProcessor 模式,兼及 eval-flagging 与自洽之检,使取样于成实有所指而非漫然。若尔择踪迹之器,或欲减己所用之费,则当一读。

LLM Observability Pocket Guide: Picking the Right Tracing & Evals Tools for Your Team