惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

Hacker News - Newest: "AI"

Local Woman Bilked Out of Thousands After Scammers Clone Daughter’s Voice With AI Pope Leo warns that AI challenges must be confronted with regulation, transparency in his 1st encyclical Challenges for AI Misuse Prevention Your AI Tools Are Only as Good as Your Judgment — And That's the Point GitHub - shubhamgoel27/artifold: 📚 A local-first library for the stuff you make with AI. Index, search, preview, share — and use your past work as the style guide for your next one. Qualcomm strikes AI chip deal with TikTok owner ByteDance Why I Made a Journal for AI-Generated Papers — Cesar A. Hidalgo Xiaomi MiMo Api Open Platform - Token Plan Global Launch When AI Writes the World's Software, Who Verifies It? — Leonardo de Moura GitHub - aarifmms/keyblind: keyblind New studies find systematic religious bias in ChatGPT, other AI Meta and Google AI safety controls can be stripped in minutes, Financial Times testing finds SK hynix unveils self-cooling iHBM chips to combat AI overheating ByteDance offers AI team special stock to combat poaching GitHub - Agile-V/agile_v_skills: 🔬 Verifiable AI-Augmented Engineering Framework - Stop AI hallucinations with formal traceability (REQ→ART→TC). Agent Skills for Claude Code, Cursor, VS Code & Copilot. Enterprise-grade: ISO 9001, ISO 27001, GxP-ready. Red Team verification, multi-cycle lifecycle, behavioral anti-patterns. The Collaborative Exoskeleton of AI Science GitHub - AlphaBitCore/nexus-gateway The Five Pillars of AI Agent Accountability: A Diagnostic Framework for Engineering Leaders AI agents imperiled by critical vulnerability in open source package The Vibe Coding Era: Why AI Won't Replace Software Engineers [video] AI agents are scrambling power users' brains Ask HN: Has AI affected negatively the job market for devs? Show HN: I built a tool to auto-accept AI slop and bigtech devs loves it OpenAI admits AI hallucinations are mathematically inevitable, not just engineering flaws starlette - secwest.net - secure virtual engagement Shopify's AI Developer Sam Altman and Dario Amodei are both walking back their AI jobs apocalypse prophecies as they eye blockbuster IPOs | Fortune twitter.com Robotics giant Figure AI demonstrates its robots to the world Bay Area mom out thousands after scammers use AI to mimic daughter's voice in fake kidnapping The Swing Sensei App - App Store 6 Million Fake GitHub Stars: How to Vet Open-Source AI Tools Before You Bet on Them Why AI's Biggest Deals Price Assets Before Revenue AI chatbots show bias toward Catholicism, researchers say LMIM OS – an offline AI ecosystem. Voice, RAG, WhatsApp. ++ One file. 0 setup Authors versus AI and the risks to government public sector push There's at Least One Job That AI Isn't Killing AskMingLi: AI-assisted BaZi chart readings AI Isn't Management. Try Explaining That to Matthew Prince Who Wants to Be Hired? (May 2026) – AI Engineer (Python, RAG, Agentic Workflows) twitter.com The AI Industry Just Walked Into the Vatican Humanize – two LLM-agnostic skills to rewrite and detect AI text HypeScribe – AI-powered transcription, summaries, and search for any audio/video GitHub - NikhilSKashyap/interviewsignal: AI-native broad-interviewing. Share a code, capture thought process, auto-grade on submit. pip install, zero setup cost, pure signal. Uber burned through its entire 2026 AI budget in four months. Now its COO is questioning whether it's worth it | Fortune FlowLink: MCP proxy blocking destructive AI agent commands Blitzy AI charges by LOC generated AI-Related Issues in Securities Cases: Privilege Pitfalls, 'AI Washing' Claims AI is killing All About Berlin Pheno: AI-Powered Personalized Health Platform GitHub - rishavsunny12/harvestGuard: Lets see how claude code creatively creates a project for me NES, SNES, Genesis, VirtualBoy, and PSX | A journey with AI and Recompilation The Rise of the AI Script Kiddie Stack Overflow's forum is dead thanks to AI SpaceX's AI Pursuits Have yet to Take Off Do AI Risks Require Extraordinary Government Intervention? GitHub - Dylanchess0320/LuckyD-Code: LuckyD Code - Terminal AI Assistant / Discord - https://discord.gg/ApEKKUuKd I applied to YC with an AI-native IDE for hardware prototyping AI may be fuelling U.S. business creation, but few signs of a similar trend in Canada A Board Game agent built using Sanity Context and Vercel's AI SDK | Sanity Microsoft’s GitHub was positioned to win the AI coding race. Outages got in the way Too dangerous to release: is Mythos the start of the restricted-AI era? Show HN: Audiogen – a new take on generative music AI ScribeItLocal — Free Local Video & Audio Transcription The Three-Cylinders Problem — When AI Models Choose Beauty Over Truth Show HN: MurrDB: A RocksDB-based NVMe/S3 cache for AI inference workloads The rise of the -10x engineer: The negative side of AI productivity Safe Ways to Use AI Agents Programming Is Real Engineering, And AI Proves It What AI race? China and U.S. AI are tightly connected High-VRAM GPUs aren't the future of local AI GitHub - mbbill/mind-expander: A shared visual workspace for understanding and steering code with AI agents. Show HN: We made a cinematic heist trailer with 4 AI models for $60 Release shield-v0.7.0 · AperionAI/shield AI Startup Says It Will Pay People $2,000 a Month to Masturbate—Yes, Really MCP: Security Design Considerations for AI-Driven Automation by NSA [pdf] Rethinking organizational design in the age of agentic AI Client Challenge GitHub - takshd15/Laptop-AI GitHub - SynapCores/synapcores-agent: Real, framework-free AI support agent where SynapCores is the brain — memory, RAG, tool routing, generation in one database. Browser chat widget + live Brain debug sidebar. Fork and run in 30s. The Math Changed AI-Augmented Software Development Manifesto Whisper by Remskill — AI Voice Assistant for Desktop AI tools lead to 'clear racial disparities' in job hiring Excerpts from Pope Leo XIV's manifesto on humanity and AI | AP News GitHub - StackOneHQ/stack-nudge ‘BusPatrol’ Put AI Cameras in Tens of Thousands of School Buses. Now They Want to Give Cops Access AI Killed Stack Overflow (and why that sucks) AI-Powered Cyber Attacks in 2026: How Adversaries Are Evolving Rogue states are putting AI agents to work on sanctions evasion Show HN: Treats Human and AI the Same Seventy years of mathematics built the thing we call AI Genre glitches and unexpected promotional phrases as a sign of AI writing Reverse centaurs and the failure of AI (2021) HVTracker – trust registry for open-source AI agents The Inevitability: Why AI Cannot Be Stopped, Slowed, or Resisted WebBridge - Let Kimi Agent Drive Your Browser | Kimi RTMH: Pope Leo’s Magnifica Humanitas on AI — LessWrong GitHub - SkepticCTO/decoding_the_language_machine: Documentation, Prompts, and Media for the "Decoding the Language Machine" series
AI Billing is (mostly) token plumbing
jdenquin · 2026-05-27 · via Hacker News - Newest: "AI"

Why we built the Lago Agent SDK, and what we're shipping next.

We just released the Lago Agent SDK. Two libraries, Python and TypeScript. They wrap your LLM client and send token usage to Lago for billing. That's the surface.

The point is what you stop doing.

The token plumbing

Every team that shipped an AI feature in the last 18 months built the same thing. Smart search, inbox triage, meeting summaries, coding agents, vibe-coded apps. All of them ended up writing token-extraction middleware.

The middleware is the same job, repeated everywhere. Call an LLM. Parse the response for token counts. Attribute the call to a customer. Send the count to a billing system. Repeat for every provider, every model family, every streaming response, every retry, every cached call.

Every provider returns usage in a different shape:

openai_resp.usage.prompt_tokens
anthropic_resp.usage.input_tokens          # plus cache_creation_input_tokens, cache_read_input_tokens
bedrock_resp["usage"]["inputTokens"]       # camelCase, dict access, no cache fields at this level

Cache tokens have sub-types. Streaming responses bury usage in the last event, sometimes. Reasoning tokens are folded into output on some models, broken out on others. The schemas change every quarter.

This is the token plumbing. Not differentiating, not what your AI feature is for, and it breaks every time a provider ships an update.

Two audiences, same plumbing

The B2B SaaS team adding AI to an existing product. Intercom shipping Fin on top of seat-based pricing. Notion layering AI as a per-seat add-on. Atlassian Intelligence rolling out across Jira and Confluence. The team has billed per-seat for years and now needs to charge for inference-backed features without rewriting the engine. Product wants AI live in two weeks. Engineering owns a sidecar nobody wants to maintain. The CFO wants to know if the feature has positive margin. Nobody can answer cleanly because token data lives in logs, not invoices.

The AI-native team building on top of LLMs. Cursor, Lovable, Replit, voice and browser agents. They pay a per-token rate to a model provider and bill the user with margin on top. Cost-plus, end to end. Every point of margin matters because COGS is variable per-customer and tracked in real time. Under-count and they bleed margin. Over-count and they lose trust. The middleware has to be exact, every release, for every model they add.

Both groups built the same plumbing. We're tired of building it.

Wrap once

Before, billing an LLM call looked something like this.

resp = client.converse(modelId="...", messages=[...])

usage = resp["usage"]
billing.send_event(customer_id, "llm_input_tokens",  usage["inputTokens"])
billing.send_event(customer_id, "llm_output_tokens", usage["outputTokens"])
billing.send_event(customer_id, "llm_cache_read",    usage.get("cacheReadInputTokens", 0))
# ... repeat for cache writes, tool calls, reasoning tokens, streaming chunks
# ... then write it all again, differently, for the next provider you add

After, you wrap the client once.

# OpenAI
client = sdk.wrap(OpenAI())
client.chat.completions.create(model="gpt-4o", messages=[...])

# Anthropic
client = sdk.wrap(Anthropic())
client.messages.create(model="claude-sonnet-4-5", messages=[...])

# Bedrock
client = sdk.wrap(boto3.client("bedrock-runtime"))
client.converse(modelId="...", messages=[...])

# token attribution happens automatically, per customer, across every provider

What lands in billing tells the story.

Old world. Anthropic returns one shape:

{
  "model": "claude-sonnet-4-5",
  "usage": {
    "input_tokens": 1200,
    "output_tokens": 340,
    "cache_creation_input_tokens": 800,
    "cache_read_input_tokens": 4000
  }
}

OpenAI returns another:

{
  "model": "gpt-4o",
  "usage": {
    "prompt_tokens": 1200,
    "completion_tokens": 340,
    "prompt_tokens_details": { "cached_tokens": 4000 }
  }
}

Different field names, different nesting, different cache semantics. You write one extractor per provider, map the fields, send one event per dimension. Then a model adds a new field and you do it again. New world. The SDK normalizes both into the same canonical shape and batches them to Lago:

{
  "external_subscription_id": "sub_acme",
  "events": [
    { "code": "llm_input_tokens",         "properties": { "value": 1200 } },
    { "code": "llm_output_tokens",        "properties": { "value": 340  } },
    { "code": "llm_cached_input_tokens",  "properties": { "value": 4000 } },
    { "code": "llm_cache_creation_tokens","properties": { "value": 800  } }
  ]
}

Same event shape regardless of provider. Customer attribution is automatic. Cache fields populate when the provider returns them, stay absent when it doesn't.

The wrapped client behaves identically to the original. Same arguments, same return shape, same exceptions. The SDK extracts usage from every response, normalizes it across providers, attributes it to a customer subscription, and streams events to Lago in batches. Overhead in the low milliseconds. If anything in the SDK fails, the LLM call still returns.

No migration. The application calls the model the same way it did yesterday.

Two layers, two jobs

Most teams have infrastructure around their LLM calls. Edge proxies for caching repeated prompts. AI gateways for fallback routing and rate limits. Observability layers for latency and error tracking. Edge inference hosts for region-locality. These layers protect margin and user experience.

The SDK composes with them. It runs in your application process, alongside whatever you already use. If your stack runs through Cloudflare AI Gateway, the Gateway keeps doing its job and the SDK reads the response that comes back through it. Same for Bedrock with API Gateway in front, an edge setup on Workers AI, or a self-hosted LiteLLM proxy.

Two layers, two jobs. Your existing stack knows about your traffic: what got cached, what got retried, what was slow. The SDK knows about your customers: which subscription this call belongs to, what feature it was billed against, what margin tier the customer is on. Caching savings show up in your cost line. Token counts show up on the customer's invoice. Both layers see the same response, so the math agrees.

The pricing-table promise

The SDK gets tokens out of the response and into billing. It does not yet tell you what those tokens cost.

If you're billing cost-plus today, you maintain your own pricing table. Per-model input rate. Per-model output rate. Cache read and cache write with separate TTL tiers. Long-context surcharges. Reasoning tokens. The table moves every time a provider posts a blog. You're updating a YAML file in your repo and hoping nobody forgot the last change.

The next thing we're shipping is the table itself. Lago maintains current per-model pricing for every major provider. You set a markup. We compute cost from the token counts the SDK already captures, apply your margin, and charge the customer. You stop tracking provider price changes. You stop reconciling cost-plus math at month-end.

For AI-native teams, that's pass-through cost with a clean markup, kept honest by infrastructure that updates when the providers update. For B2B SaaS adding AI features, the same table answers the margin question the CFO keeps asking, without anyone maintaining a spreadsheet.

The gap

The gap between "the LLM returned tokens" and "the customer got billed for tokens." Every customer-facing team building AI owns it. Most have a half-finished plan to extend it for the next provider.

It's the most code per dollar of value of anything in your stack. Someone has to own it. It should not be every team in the industry, in parallel, separately, forever.

Try it

The libraries are on GitHub today.