惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

P
Proofpoint News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Cisco Talos Blog
Cisco Talos Blog
Martin Fowler
Martin Fowler
S
SegmentFault 最新的问题
宝玉的分享
宝玉的分享
T
Tenable Blog
Stack Overflow Blog
Stack Overflow Blog
P
Palo Alto Networks Blog
J
Java Code Geeks
T
True Tiger Recordings
S
Schneier on Security
C
Cybersecurity and Infrastructure Security Agency CISA
Stack Overflow Blog
Stack Overflow Blog
爱范儿
爱范儿
博客园 - 【当耐特】
WordPress大学
WordPress大学
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
H
Help Net Security
F
Future of Privacy Forum
Scott Helme
Scott Helme
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
S
Security @ Cisco Blogs
Application and Cybersecurity Blog
Application and Cybersecurity Blog
博客园 - 司徒正美
V
V2EX
Google DeepMind News
Google DeepMind News
云风的 BLOG
云风的 BLOG
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Malwarebytes
Malwarebytes
大猫的无限游戏
大猫的无限游戏
C
Check Point Blog
The GitHub Blog
The GitHub Blog
The Hacker News
The Hacker News
博客园 - 聂微东
李成银的技术随笔
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
V
Vulnerabilities – Threatpost
O
OpenAI News
C
Cyber Attacks, Cyber Crime and Cyber Security
C
Comments on: Blog
Project Zero
Project Zero
Engineering at Meta
Engineering at Meta
Recent Announcements
Recent Announcements
N
Netflix TechBlog - Medium
博客园 - Franky
aimingoo的专栏
aimingoo的专栏
M
Microsoft Research Blog - Microsoft Research
Security Latest
Security Latest
T
Tor Project blog

DEV Community

Self-Hosted LLM Tool Calling: Forge and the Build-vs-Buy Decision ORA-00072 오류 원인과 해결 방법 완벽 가이드 OpenWA for CTOs: Self-Hosted WhatsApp Gateway Trade-Offs NotebookLM Automation With notebooklm-py: Useful, But Classify Data First Docker v29.5.x Operator Upgrade Checklist Coding-Agent Instruction Design: The CLAUDE.md File That Prevents Rework When I Finally Realized My Runtime Was Holding Me Back GnokeOps: Host Your Own AI House Party AI Agents in Practice — Part 2: What Makes Something an Agent Beyond Prompts: Structuring AI Workflows for Real Frontend Engineering From an Abandoned Hackathon Project to an AI Study Workspace 🚀 Terraform with AI: Build AWS Infra (Cursor + MCP) What If AI Didn’t Need the Internet? 750,000 Chips, 140 Trillion Tokens: The Math Behind DeepSeek's Permanent Price Cut You're Renting Someone Else's Compute — And It's Costing You More Than You Think CSS :has() Selector: The Layout Trick I Wish I Knew 5 Years Ago Five Clusters. Five Lessons. One Production System. Synaptic: A Local-First AI Dev Companion That Remembers How You Think Revolutionizing Edge MedTech: Building a Sovereign Sleep Apnea Companion ("XiHan Snore Coach") with Gemma 4 HDD Eksternal Tiba-Tiba Tidak Bisa Diakses di Windows? Ini Tiga Lapis Fix-nya DMARC p=none vs p=quarantine vs p=reject: what to use and when DSA Application in Real Life: How Git Diff Works: LCS Intuition, Myers Algorithm, and Real Code Changes I solo-built a reputation layer for AI agents on NEAR — and here's what I learned I built an AI faceless video generator in 2 months — here's the stack Diffusion Language Models: How NVIDIA Nemotron-Labs Diffusion Shatters the Autoregressive Speed Ceiling llm-nano-vm v0.8.0 — deterministic FSM runtime for LLM pipelines, now with output validation and per-step timeouts From the Renaissance to the Quantum Dawn: AI, Computation, and the Next Paradigm Shift How I Built a Review Site with 800+ Articles Using AI I Built a Smart Kitchen AI with Gemma 4 That Turns Fridge Photos Into Recipes Why your vulnerability dashboard is lying to you (and how to fix it) From Abandoned Prototype to Smart AI System: Reviving Trafiq AI with GitHub Copilot Why Country/State/City Pickers Are Weirdly Hard Node.js 22 LTS — EOL Date, Support Timeline, and What Comes Next The 7-Layer Memory Architecture Behind Modern AI Agents I Imagined Hermes Agent Running an Entire Smart City — And It Changed How I See AI One backend, four products: why we bet on platform-per-brand AI's tech debt is invisible — even to AI. I solved it at the architecture layer. Why ROAS 300% Can Still Mean Losses — Gross Margin in 5 Ecommerce Verticals You Don’t Need to Try Every AI Tool to Keep Up NovelPilot: A Novel Writing Agent Powered by Gemma 4 BoxAgnts is an Out-Of-The-Box Secure AI Agent ToolBox in a WASM SandBox Gemma 4 deep dive: why a 1.5 GB model scores 37.5% on competition mathematics, how the MoE routing actually works, and which model fits your hardware. Full breakdown inside. BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090 Google Just Declared the Chat-Log Interface Dead. Here's What Neural Expressive Actually Signals for Developers. ARCHITECTURE SPECIFICATION & FORMAL SYSTEM REPORT: k501-AIONARC Notes from a Hammock What's Google Antigravity 2.0 ? Here's What the Agent Harness Actually Changes for Developers. Building an E2EE Chat App in Flask - Part 3: Keeping File Uploads Safe Google's Gemini Spark. Here's What It Actually Does for Developers. Microsoft Just Shipped MCP Governance for .NET. Here's What It Actually Enforces. How I Built a Pakistan Internet Speed Test Platform at 16 How to Build a Supervisor Agent Architecture Without Frameworks I Built My Own Corner of the Internet — Here's What It Looks Like How does VuReact compile Vue 3's defineExpose() to React? Neo-VECTR's Rift Ascent Idempotency Keys: The API Safety Net You Probably Aren't Using Building E-Commerce Sites for Niche Products: Technical Lessons from Specialty Outdoor Retailers Audit Logs: The Silent Guardian of Every Serious System Open-source SDS tooling for Japanese MHLW compliance: the gap nobody filled BetAGracevI I Built a Post-Quantum Cryptographic Identity SDK for AI Agents — Here's Why It Needs to Exist Running Claude Code across multiple repos without losing context There Are Cameras in Every Room of My House. I Put Them There. Why your AI agent loops forever (and how to break the cycle) How does VuReact compile Vue 3's defineSlots() to React? Building a Privacy-First Resume Editor with Typst WASM and React One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd From Pixels to Prescriptions: Building an Autonomous Healthcare Booking Agent with LangGraph MonoGame - A Game Engine for Those Who Love Reinventing the Wheel # Day 24: In Solana, Everything is an Account Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests RP2040 Wristwatch Tells Time With a Vintage VU Meter Needle observations about models / 2026, may From Video Transcripts to Source-Grounded AI Notes: A Practical Look at Notesnip AI Agent Dev Environment Guide — Real Experience from an AI Living Inside a Server How I Run 7 AI Models 24/7: Multi-Agent Architecture in Practice What exactly changes with the Claude Max plan? I Revived a Broken MLOps Platform — Now It's Self-Service, Policy-Guarded, and Operationally Credible OpenAI's $2M-tokens-for-equity YC deal, decoded Why DMX Infrastructure is Still Stuck in the 90s Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm Open Source Project (No.73): Sub2API - All-in-One Claude/OpenAI/Gemini Subscription-to-API Relay I Made the Wrong Bet on Event Streaming in Our Treasure Hunt Engine #ai #productivity #chatgpt #python Symbolic Constant Conundrum From Manual RAG to Real Retrieval — Embedding-Based RAG with NVIDIA NIM Building an outbound-only WebSocket bridge for local AI agents Our System's Sins in Ghana: Why We Had to Rethink Digital Product Sales Execution Governance, AI Drift, and the Security Paradox of Runtime Enforcement Differential Pair Impedance: Why USB and HDMI Routing Is a Geometry Problem Small AI database questions can become big scans Claude Code 2.1 Agent View & /goal: Autonomous Dev Guide 2026 Your AI database agent should not see every column Rust's Low-Latency Conquest: Why We Ditched C++ for a Treasure Hunt Engine Floating-point will quietly corrupt your emissions math, and 0.1 + 0.2 already warned you Autonomous Agents: what breaks first (and why that's the real product) [2026-05-23] Agent payments are the new cloud bill footgun ORA-00069 오류 원인과 해결 방법 완벽 가이드 How I Built a Local, Multimodal Gemma 4 Visual Regression & Patch Agent: Closed-Loop Validation, Canvas Pixel Diffing, and Reproducible Benchmarks
Stop scattering LLM SDK/API calls across your codebase. Here is the 2-file rule that fixed mine
Babak Abbasc · 2026-05-23 · via DEV Community

I upgraded an LLM SDK and expected a routine version bump.

Instead I had to touch 15+ files, fix breaking changes across four providers, and spend the rest of the day hoping I had not missed one. That was the second time it happened. I knew there would be a third.

If you have ever shipped a production LLM system, you probably recognize the smell:

  • An SDK minor version renames maxTokens to maxOutputTokens and now 15 files break at runtime, not compile time.
  • Switching one classification task from Claude to a cheaper model means editing import paths and type signatures in business logic.
  • You have written classifyEmail, scoreLead, triageTicket, and categorizeRequest, and they are all the same function with a different prompt string.

This is not an SDK problem. It is an architecture problem. Here is how I fixed it, and the open-source library that came out of it.

The 2-file rule

I made one rule: only two files in the entire codebase are allowed to import the LLM SDK. One adapter that translates my interface into SDK calls, and one provider registry that creates clients from config. Everything else talks to a typed interface and has no idea which provider, model, or SDK is in play.

This is just hexagonal architecture (ports and adapters, per Alistair Cockburn) applied to LLMs. You already do this for databases and message queues. Nobody scatters raw SQL across business logic. LLM providers belong in the same category. They are infrastructure, not application logic.

The dependency flow goes from this:

Application code
  ├─ direct SDK call
  ├─ direct SDK call
  └─ model router leaking SDK types

Enter fullscreen mode Exit fullscreen mode

To this:

Application code
  ↓  llmClassify(), llmDraft(), llmScore() ...
Capabilities
  ↓
LLM Port  (TypeScript interface, zero SDK imports)
  ↓
Adapters + Provider Registry  (the only 2 files that touch the SDK)
  ↓
OpenAI / Anthropic / Gemini / Ollama / Vercel AI SDK

Enter fullscreen mode Exit fullscreen mode

The caller says what it wants (taskType: "triage"). The infrastructure decides how. No model name parameter. No provider parameter. Policy is deferred to config.

The proof: an SDK upgrade that did not hurt

The real test came during a major SDK version jump with breaking changes (maxTokens to maxOutputTokens, CoreMessage to ModelMessage, and more). Here is what the migration commit looked like:

  • 2 files changed (the adapter and the agent runtime), plus 1 minor fix.
  • All 18 activity files unchanged.
  • All 10 agent files unchanged.
  • The final migration deleted more code than it added: 192 insertions, 688 deletions.

28 out of 31 files did not change, because they do not know the SDK exists. If a core dependency upgrade touches your business logic, your boundaries are wrong.

The part that surprised me: the same 7 operations, everywhere

I started this to isolate the SDK. Then I noticed the bigger problem. I was not calling LLMs in 21 different places. I was reimplementing the same seven cognitive operations with slight variations:

Capability What you give it What you get back
Classify content + rubric one label from an enum + reasoning
Score content + rubric + axes numeric ratings per axis
Draft persona + situation longer text in a chosen tone
Summarize long content + length target shorter content, key points kept
Extract unstructured text + schema a typed structured object
Plan goal + constraints an ordered list of steps
Analyze evidence + question recommendation with caveats

Five activities classified content with five different prompt structures. Nine drafted messages with nine different tone injections. Same operation, no shared implementation. When I improved one classification prompt, I had to remember to update four other places. I usually forgot.

You are not writing 47 prompts. You are writing 7 prompts, 47 times, with slightly different ingredients.

So I extracted them into capability factories. A factory takes the invariant parts (schema, rubric, model routing, observability hooks) and returns a function that takes only the varying part (the content):

import { createClassifier } from "@llm-ports/capabilities";
import { z } from "zod";

const IntentSchema = z.object({
  intent: z.enum(["question", "request", "complaint", "feedback", "other"]),
  urgency: z.enum(["low", "normal", "high"]),
  reasoning: z.string(),
});

export const classifyIntent = createClassifier({
  port: llm,                 // your provider-agnostic port
  schema: IntentSchema,
  schemaName: "user-intent",
  rubric: `
    question: asking for information
    request: wants something done
    complaint: reports a problem
    feedback: opinion only
    other: anything else
  `,
});

Enter fullscreen mode Exit fullscreen mode

Then every call site, across all your files, is the same shape:

const result = await classifyIntent({ content: userMessage });
// { intent: "request", urgency: "high", reasoning: "..." }  fully typed

Enter fullscreen mode Exit fullscreen mode

Improve the rubric once, and every classifier in the system gets better. Prompt engineering stops being scattered strings and becomes a reusable system asset.

llm-ports

I pulled this pattern out of my production system and shipped it as an open-source, MIT-licensed TypeScript library: llm-ports.

60 second setup

Configure providers in .env:

LLM_PROVIDER_FAST=anthropic|<model>|cost:50/day
LLM_PROVIDER_SMART=anthropic|<model>|cost:200/day
LLM_TASK_ROUTE_TRIAGE=fast,smart

Enter fullscreen mode Exit fullscreen mode

Create the port once:

import { createRegistryFromEnv } from "@llm-ports/core";
import { createAnthropicAdapter } from "@llm-ports/adapter-anthropic";

export const llm = createRegistryFromEnv({
  adapters: {
    anthropic: createAnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY! }),
  },
}).getPort();

Enter fullscreen mode Exit fullscreen mode

Use it anywhere, with no SDK imports:

const result = await llm.generateText({
  taskType: "triage",
  prompt: "Classify this email...",
});

Enter fullscreen mode Exit fullscreen mode

The registry selects the right model for the task, enforces cost limits, falls back through the provider chain on budget exhaustion, and records usage, cost, and latency.

What you get

  • Multi-provider routing across OpenAI, Anthropic, Google Gemini, Ollama, and the Vercel AI SDK.
  • Fallback chains when a provider exceeds budget.
  • USD-based cost gating with hourly, daily, and monthly limits. Budget exhaustion is a typed exception, not a surprise invoice.
  • The 7 capability factories: createClassifier, createScorer, createDrafter, createSummarizer, createExtractor, createPlanner, createAnalyzer.
  • Validation recovery for structured output. If a model returns invalid JSON or a wrong enum, it auto-retries with a correction prompt. Bad output stops at the capability boundary instead of leaking downstream.
  • Tool-use safety primitives: destructive markers, confirmation-required actions, max output byte limits.
  • Observability hooks for cost, latency, quality, and outcomes.
  • No runtime dependency on LangChain or LlamaIndex. Core plus one adapter plus capabilities is a small install footprint, strict TypeScript throughout.

How it compares

  • Vercel AI SDK unifies provider calls. llm-ports adds the registry, fallback chains, USD cost gating, validation recovery, and capability factories on top. There is an adapter to migrate from it incrementally.
  • LiteLLM is a Python-first HTTP proxy. llm-ports is TypeScript and runs in-process, no extra network hop.
  • Portkey is a commercial hosted gateway. llm-ports is MIT and has no hosted dependency.
  • LangChain.js is a framework. llm-ports is a lightweight architecture and control layer, not a framework you build your whole app inside.

When to use it (and when not to)

Use it if you run 2+ providers (or might switch later), have 5+ call sites, keep getting bitten by SDK upgrades, or need cost control and centralized quality tracking.

Skip it if you have 1 or 2 LLM calls, you are just prototyping, or you want a full agent framework with a built-in memory and RAG layer.

Honest status

llm-ports is pre-release, currently at 0.1.0-alpha.5. The core architecture is stable with 250+ offline regression tests, but some adapter and agent paths are still being hardened (multi-turn agent in the Vercel adapter and retry-on-runtime-error both land in v0.2). The per-surface status is documented openly so you know what is solid before you adopt it.

Try it

npm install @llm-ports/core @llm-ports/adapter-anthropic @llm-ports/capabilities

Enter fullscreen mode Exit fullscreen mode

If the capability-factory pattern matches how you are building, I would genuinely like feedback in GitHub Discussions. What shapes are you reimplementing that are not on the list of seven? What knobs do the capabilities need that they do not have yet?

The LLM stops being a dependency you manage. It becomes infrastructure you configure. Once you make that shift, everything else gets simpler.


Based on two longer write-ups: Ports and Adapters for AI and The 7 LLM Capabilities Every Production AI System Reimplements.