慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
客户专属大模型成本报告(无需重构计费流程)
Gabriel Anha · 2026-05-24 · via DEV Community

金融者趋前,问曰:"上月阿克美公司之LLM调用,耗资几何?"汝启追踪界面,凡脉络皆具request_id,皆无customer_id。制器之工,以线迹为标,如SDK之例,依请而行。客源之属,乃Future You之责。今尔为Future You.

,欲重筑之。增一customer_id 每一发射之场,布设十务,与数据之众合其式,运迁于库。六周之功。财需其数于明日。

无需重筑。需OpenTelemetry之行李,版本之价表,及一聚工。删其引,或八十行耳。

何故"于每一发射标以客号"之难,逾乎其声?

愚答曰:可传之。customer_id凡LLM客户端所至,皆此论争。然观其代码之始,立见其亡矣。

尔之同步HTTP径,知其客矣。在认证境中,易也。彼后置之工,发总括之任于卅秒后,得任ID,自Redis中出列,亦呼同LLM之客。客乃越二跳。尔须穿此于任之负载。今复如是于重试之处理器,于预取之器,温缓存;于预定之嵌入任;于webhook之散;于CI中重演提示之评估者。

凡彼呼处,皆忘之机也。忘者入于"共享"之桶,或竟不标焉。半载之后,此桶之费,占汝财用之四成,财官诘问其故。

其解之道,在于止传显之。一设于界,使框架自传。此乃行李之用也。

OTel行李为传之层

行李于OpenTelemetry者,乃随迹而行之上下文键值囊也。囊中所载,随此上下文所生之每一脉络,越线程之界,越异步之界,若传之电波,亦越进程之界而随之。

汝于请命之处理、作业出列之际、时序之刻,设顾客之号一次。此后,可忘之。凡此境中,LLM之客所发之文,皆承其值以为属,盖因汝设一处理器,使此值流传于各文。

from opentelemetry import baggage, context, trace
from opentelemetry.sdk.trace import SpanProcessor

class BaggageToAttributesProcessor(SpanProcessor):
    # whitelist the baggage keys you want promoted to span attrs.
    # promoting blindly is a privacy footgun.
    KEYS = ("customer_id", "tenant_id", "billing_account_id")

    def on_start(self, span, parent_context=None):
        ctx = parent_context or context.get_current()
        for key in self.KEYS:
            value = baggage.get_baggage(key, ctx)
            if value is not None:
                span.set_attribute(f"app.{key}", value)

    def on_end(self, span): pass
    def shutdown(self): pass
    def force_flush(self, timeout_millis=30_000): return True

入全屏模式 出全屏模式

设此之器于初立之踪迹供者。今于汝之码中,随处可用。

ctx = baggage.set_baggage("customer_id", "cus_8H2k...")
token = context.attach(ctx)
try:
    # any LLM span created from here inherits app.customer_id
    response = llm_client.messages.create(...)
finally:
    context.detach(token)

入全景模式 出全屏模式

HTTP與gRPC,W3Cbaggage首传之器,自携其值以下游。于汝之任列,入列时将行李序列于任载,出列时复其本。入列之器八行,出列之器亦八行。汝但书之一次。

要旨之规:唯设行囊于之界。HTTP之中介也。事消费者也。时序之入口也。未尝入于事务之理。若见己于函数中呼set_baggage,则已复堕于所避之单次调用之纷乱矣

。价目之SQL,版本化以安 replay 之开票。

标签示人,孰为致电者。然未示其费几何。费无常,供者减价,汝易机型,新折价至。若以tokens * current_price计费于询时,每季账单随价变而变。非账单也,乃臆测耳。

汝需一版本价表。每行皆以键为序。(model, valid_from, valid_to) 並含每百万令牌之率,分向与缓存状态析之。

CREATE TABLE llm_price_book (
    model               TEXT       NOT NULL,
    valid_from          TIMESTAMPTZ NOT NULL,
    valid_to            TIMESTAMPTZ NOT NULL,
    input_per_mtok_usd          NUMERIC(10, 6) NOT NULL,
    output_per_mtok_usd         NUMERIC(10, 6) NOT NULL,
    cache_write_per_mtok_usd    NUMERIC(10, 6) NOT NULL,
    cache_read_per_mtok_usd     NUMERIC(10, 6) NOT NULL,
    PRIMARY KEY (model, valid_from)
);

-- example row: Claude pricing snapshot as of Q1 2026
INSERT INTO llm_price_book VALUES (
    'claude-opus-4-7',
    '2026-01-01 00:00:00+00',
    '2999-12-31 23:59:59+00',
    15.00, 75.00, 18.75, 1.50
);

进入全屏模式 退出全屏模式

valid_to 乃当前行指向年2999之警戒符,直至次价送至,此时更此行,并插新行。唯一约束制汝不得留二开区间相叠。

乃以用度配价,非以时辰为键,盖取时段之始也,非取时也:

SELECT
    u.customer_id,
    u.ts,
    u.model,
    u.input_tokens,
    u.output_tokens,
    u.cache_read_tokens,
    u.cache_write_tokens,
    (u.input_tokens       * p.input_per_mtok_usd
   + u.output_tokens      * p.output_per_mtok_usd
   + u.cache_write_tokens * p.cache_write_per_mtok_usd
   + u.cache_read_tokens  * p.cache_read_per_mtok_usd) / 1e6
        AS gross_cost_usd
FROM llm_usage_events u
JOIN llm_price_book p
  ON p.model = u.model
 AND u.ts >= p.valid_from
 AND u.ts <  p.valid_to;

入全屏模式 出全屏模式

凡以史实之期而用此法,则得顾客所实受之价,亦得模型当日所实索之价。明年复用之,其答如一。此即所谓"可重演"者:账单乃恒久之迹,非今时之报也。

总费与净费:双数账单之式

复有曲折焉。提示缓存更易算术。若每请皆发同30k令符之系统提示,首呼则于基率1.25倍写入缓存,续呼则0.1倍读取之。毛耗者,客之流讯未用缓存时之耗,乃欲计费之数也。净耗者,实付提供者之耗,乃入吾囊之数也。

此数各异,其用亦殊。

财会求净以入账,客成求毛以显值("观吾助君省几何"),销行欲兼之——毛为题要,净为利模。择一则诘问无已,报二则言谈自迁。

SELECT
    customer_id,
    DATE_TRUNC('day', ts) AS day,
    SUM(input_tokens       * p.input_per_mtok_usd
      + output_tokens      * p.output_per_mtok_usd) / 1e6
        AS gross_cost_usd,
    SUM(input_tokens       * p.input_per_mtok_usd
      + output_tokens      * p.output_per_mtok_usd
      + cache_write_tokens * p.cache_write_per_mtok_usd
      + cache_read_tokens  * p.cache_read_per_mtok_usd) / 1e6
        AS net_cost_usd
FROM llm_usage_events u
JOIN llm_price_book p
  ON p.model = u.model AND u.ts >= p.valid_from AND u.ts < p.valid_to
GROUP BY customer_id, DATE_TRUNC('day', ts);

入全景模式 出全屏模式

Gross者,每输入一符,皆视作全额输入。Net者,则将缓存行数,依折价回补。二者之差,即为缓存之省,此乃汝当示于幻灯,以应客问价效之数也。

八十行之工,聚散为日,表一客一。

此跨度数据存于汝之追迹后端(Tempo、Honeycomb,无论何名)。汝不欲财会之司询及此。汝欲一稳定、仅可追加之daily_llm_cost表,使财会可与之联接其既信之客户维度表。

一微工读昨日之LLM跨度,与之联接价目簿,乃按日每客户作合计行而书之。于格林威治标准时间二时运行之。于晨会之前毕。

import os
from datetime import date, datetime, timedelta, timezone
import psycopg
from opentelemetry.sdk.trace.export import SpanExportResult

DSN = os.environ["WAREHOUSE_DSN"]

INSERT_SQL = """
INSERT INTO daily_llm_cost (
    day, customer_id, model,
    input_tokens, output_tokens,
    cache_read_tokens, cache_write_tokens,
    request_count, retry_count,
    gross_cost_usd, net_cost_usd
)
SELECT
    %(day)s::date AS day,
    s.customer_id,
    s.model,
    SUM(s.input_tokens),
    SUM(s.output_tokens),
    SUM(s.cache_read_tokens),
    SUM(s.cache_write_tokens),
    COUNT(*) FILTER (WHERE NOT s.is_retry),
    COUNT(*) FILTER (WHERE s.is_retry),
    SUM(s.input_tokens  * p.input_per_mtok_usd
      + s.output_tokens * p.output_per_mtok_usd) / 1e6,
    SUM(s.input_tokens       * p.input_per_mtok_usd
      + s.output_tokens      * p.output_per_mtok_usd
      + s.cache_write_tokens * p.cache_write_per_mtok_usd
      + s.cache_read_tokens  * p.cache_read_per_mtok_usd) / 1e6
FROM llm_span_events s
JOIN llm_price_book p
  ON p.model = s.model
 AND s.ts >= p.valid_from
 AND s.ts <  p.valid_to
WHERE s.ts >= %(day_start)s
  AND s.ts <  %(day_end)s
  AND s.customer_id IS NOT NULL
GROUP BY s.customer_id, s.model
ON CONFLICT (day, customer_id, model) DO UPDATE
SET input_tokens       = EXCLUDED.input_tokens,
    output_tokens      = EXCLUDED.output_tokens,
    cache_read_tokens  = EXCLUDED.cache_read_tokens,
    cache_write_tokens = EXCLUDED.cache_write_tokens,
    request_count      = EXCLUDED.request_count,
    retry_count        = EXCLUDED.retry_count,
    gross_cost_usd     = EXCLUDED.gross_cost_usd,
    net_cost_usd       = EXCLUDED.net_cost_usd;
"""

def run(target_day: date) -> None:
    day_start = datetime.combine(target_day, datetime.min.time(),
                                 tzinfo=timezone.utc)
    day_end = day_start + timedelta(days=1)
    with psycopg.connect(DSN) as conn:
        with conn.cursor() as cur:
            cur.execute(INSERT_SQL, {
                "day": target_day,
                "day_start": day_start,
                "day_end": day_end,
            })
        conn.commit()
    print(f"aggregated {target_day}")

if __name__ == "__main__":
    # default: yesterday in UTC. allow CLI override for backfills.
    import sys
    if len(sys.argv) > 1:
        run(date.fromisoformat(sys.argv[1]))
    else:
        run(date.today() - timedelta(days=1))

入全景模式 出全屏模式

ON CONFLICT ... DO UPDATE使工者无往复之患。重行之于同日,复得旧行,此于补录价册之修、或仓中迟至之段尤要。

与汝之ETL所运行者,无论其为Airflow、Kubernetes CronJob,皆可依时调度之。pg_cron行。五行配置。财计于每晨02:30 UTC得稳,可重演之daily_llm_cost表。

惟所患者:重试增每客之费,未增其收

今乃月二之噬人处。

客求超时,客复试。复试中异模回退,得成,然大语言模型供方已就初试之输入符收费。客仅遇一求,然账单见二。

若愚计跨度,则已向客收未值之费。财会悦。客成不悦。 客问其账单何以增三成,而用此物如故,则喜之。

当别计其重试。上所述之聚合工已将之筛入retry_count之列。则报章之层可显二者之别:一为总耗(所入汝之银囊),一为应计之耗(若汝转嫁成本,则当向客索之)。

易忘之篇:跨度需知其重试。于调用处标记之:

span.set_attribute("llm.is_retry", attempt > 0)
span.set_attribute("llm.attempt", attempt)

入全景模式 出全屏模式

无此,则汝重试次数之栏恒为零,而彼陷阱隐匿,直至首客之升级行动。置于汝LLM客户端之裹,则不复思虑矣。

此之效也:财司至而询阿克美公司上月之耗几何,汝指一表。数固定。与供方发票相符,毫厘不差。重试之费与自然之费分之,毛利与净利析之。汝未更易计费之脉。设三界为行囊,持价簿之表无欺,遣一役可容于屏。

尔等之众如何处置每客大语言模型之费用归属:遍传客号于每通呼,沿行李传递,抑或弃之不顾,俟每季一论账单而争辩?尔等之做法,请于评论中示之。


若此有益

前文所述之模式(行李传递、版本化价目表、毛利与净利报核),乃将追迹之制化而为成本之制之日常机巧也。《LLM可观测性袖珍指南》中关于成本归属之章 深究维度模型、仓库架构及预警层,以防客户失控而财会未及。若立追踪于 LLM 之产品,欲避“后治归因”之岁,此乃捷径。

LLM Observability Pocket Guide: Picking the Right Tracing & Evals Tools for Your Team