惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
V
Vulnerabilities – Threatpost
Attack and Defense Labs
Attack and Defense Labs
N
News and Events Feed by Topic
SecWiki News
SecWiki News
S
Security @ Cisco Blogs
Schneier on Security
Schneier on Security
B
Blog
TaoSecurity Blog
TaoSecurity Blog
The Last Watchdog
The Last Watchdog
H
Hacker News: Front Page
Hacker News - Newest:
Hacker News - Newest: "LLM"
博客园_首页
D
Docker
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Y
Y Combinator Blog
W
WeLiveSecurity
N
News and Events Feed by Topic
F
Fortinet All Blogs
PCI Perspectives
PCI Perspectives
WordPress大学
WordPress大学
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Announcements
Recent Announcements
Forbes - Security
Forbes - Security
T
Tailwind CSS Blog
Hacker News: Ask HN
Hacker News: Ask HN
爱范儿
爱范儿
腾讯CDC
Last Week in AI
Last Week in AI
月光博客
月光博客
C
Cybersecurity and Infrastructure Security Agency CISA
P
Proofpoint News Feed
Help Net Security
Help Net Security
V
V2EX
C
Cyber Attacks, Cyber Crime and Cyber Security
C
CXSECURITY Database RSS Feed - CXSecurity.com
H
Heimdal Security Blog
L
LINUX DO - 最新话题
GbyAI
GbyAI
The Hacker News
The Hacker News
罗磊的独立博客
S
SegmentFault 最新的问题
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - 【当耐特】
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
V2EX - 技术
V2EX - 技术
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
O
OpenAI News
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
PydanticAI vs LangChain - Choosing an Agent Framework for Production, Not Demos
Developer Service · 2026-06-22 · via DEV Community

In a recent audit, a team showed me an AI assistant they'd built on top of their company knowledge base. The demo had landed well: ask how to use a feature, and it walked through the exact pain point their support queue kept seeing. Leadership signed off.

In production, the same agent told a user to open a menu option that didn't exist. Not a vague answer - a specific UI path, stated with confidence. Nobody caught it in testing. It surfaced when I audited the system, not when a user complained.

The prototype passed testing because nobody was checking whether the answer matched the product. In production, that gap becomes a liability: the model invents UI paths, and your backend has no schema to reject them.

When you're choosing an agent framework, popularity is the wrong scorecard. Pick the one that fails loudly in development and gracefully in production - or you'll find out in audit.


What "Production-Ready" Actually Requires

Tutorial agents are built to impress in a fifteen-minute demo. Production agents run unattended, handle bad inputs, and ship answers your backend has to trust. The gap between those two goals is where most teams stumble - and it's rarely visible until something reaches a user.

When I audit agent codebases, I evaluate five things the tutorials skip:

  • Structured, validated outputs: Can your system reject an invented menu path before it becomes user-facing advice?

  • Dependency injection for testing: Can you swap the knowledge base for a mock in CI without rewiring the agent?

  • Retry and error handling: When the model returns malformed output, does the framework retry - or do you ship a parser exception?

  • Observability hooks: Can you trace which document grounded a bad answer when support escalates?

  • Type-checker support: Will static analysis catch a breaking API change before deploy, or after the agent silently misbehaves?

If you want to score your own system, the Production Readiness Audit covers the same five categories - deployment, observability, failure modes, and a prioritized remediation plan.


Side-by-Side: The Same Agent, Two Frameworks

The first item on the rubric is structured, validated outputs. The clearest way to see the framework difference is to build the same agent twice.

The task: answer natural-language questions about a CSV of sales data. The agent calls a tool to query the file, then returns a structured answer your API can pass downstream without a second parsing step.

LangChain

from langchain.agents import create_agent
from langchain.tools import tool

@tool
def query_sales_csv(region: str) -> str:
    """Return total revenue for a region in the sales CSV."""
    total = df.loc[df["region"] == region, "revenue"].sum()
    return f"{region}: ${total:,.0f}"

agent = create_agent("anthropic:claude-sonnet-4-6", tools=[query_sales_csv])
result = agent.invoke({
    "messages": [{"role": "user", "content": "What was Q1 revenue in Europe?"}],
})

answer = result["messages"][-1].content  # str — you validate the shape yourself

This is the pattern most tutorials teach. The tool works, the agent runs, the demo looks fine. But answer is a string (or occasionally a dict, depending on the model). Nothing in this flow checks that the response contains a real region name, a numeric revenue, or the right currency. If the model formats the answer as prose instead of data, your code finds out in production - or in audit.

LangChain does support a response_format parameter with Pydantic models. It's opt-in, and most teams I audit haven't wired it up yet.

PydanticAI

from pydantic import BaseModel
from pydantic_ai import Agent

class SalesAnswer(BaseModel):
    region: str
    total_revenue: float
    currency: str = "USD"

agent = Agent("anthropic:claude-sonnet-4-6", output_type=SalesAnswer)

@agent.tool_plain
def query_sales_csv(region: str) -> float:
    """Return total revenue for a region in the sales CSV."""
    return float(df.loc[df["region"] == region, "revenue"].sum())

result = agent.run_sync("What was Q1 revenue in Europe?")
answer = result.output  # SalesAnswer — validated before your code runs

Here, validation isn't a step you add later - it's the contract. output_type=SalesAnswer tells the agent what shape to return. If the model produces something that doesn't match - wrong field, missing revenue, invented region - PydanticAI raises before your application code touches it. You get a SalesAnswer object your type checker understands, not a string you hope to parse.

Same task, same tool, same model. The difference is what happens after the LLM responds: LangChain hands you text and trusts you'll validate it; PydanticAI hands you a typed object or fails immediately.


Dependency Injection & Testability

Validated outputs tell you the shape is right. Dependency injection tells you the data is right - and lets you prove it without calling a live API on every CI run.

Agent tools don't operate in a vacuum. They read from databases, knowledge bases, and internal APIs. In production, those dependencies are real. In tests, they need to be fake - predictable, fast, and free. The question is whether your framework makes that swap explicit or forces you to hack around it.

PydanticAI: dependencies as a first-class parameter

PydanticAI declares what an agent needs via deps_type. Tools receive a RunContext and pull dependencies from ctx.deps. At run time, you pass the real implementation; in tests, you pass a fake.

from dataclasses import dataclass
from pydantic_ai import Agent, RunContext

@dataclass
class SalesDataSource:
    def revenue_for(self, region: str) -> float:
        return float(df.loc[df["region"] == region, "revenue"].sum())

agent = Agent(
    "anthropic:claude-sonnet-4-6",
    deps_type=SalesDataSource,
    output_type=SalesAnswer,
)

@agent.tool
def query_sales_csv(ctx: RunContext[SalesDataSource], region: str) -> float:
    return ctx.deps.revenue_for(region)

# Production: agent.run_sync(prompt, deps=SalesDataSource())
# Test:       agent.run_sync(prompt, deps=FakeSalesData(revenue=1_250_000))

The type checker enforces the contract. If a tool expects SalesDataSource and you pass something else, mypy catches it before merge. Your test injects FakeSalesData(revenue=1_250_000) and asserts the agent's structured output matches - no CSV file, no network, no API key in CI.

LangChain: it works, but the seams are yours to find

LangChain agents can be tested, but the framework doesn't give you an injection point. The usual pattern is a module-level dependency the tool closes over, then unittest.mock.patch in tests:

from unittest.mock import patch
from langchain.agents import create_agent
from langchain.tools import tool

data_source = SalesDataSource()  # module-level — no framework injection point

@tool
def query_sales_csv(region: str) -> str:
    """Return total revenue for a region in the sales CSV."""
    return f"{region}: ${data_source.revenue_for(region):,.0f}"

agent = create_agent("anthropic:claude-sonnet-4-6", tools=[query_sales_csv])

# Test: patch the module where data_source lives
with patch("myapp.sales_agent.data_source", FakeSalesData(revenue=1_250_000)):
    result = agent.invoke({"messages": [{"role": "user", "content": "Q1 Europe?"}]})

Here you're patching a string path - "myapp.sales_agent.data_source" - that must match exactly where the module is imported. Rename the file, change the import structure, or run tests in parallel that share a patched global, and you get flakes or false greens.

If you've fought flaky agent tests, you've lived this. The test doesn't fail because the agent logic is wrong; it fails because the test setup is fighting the framework's defaults.

PydanticAI doesn't eliminate the need to write tests. It gives you a seam that was designed for swapping. That's the difference between "we test our agents in CI" and "we test our agents in CI reliably."


Error Handling and Retries in Practice

When the model returns garbage, what happens next? In production, "garbage" isn't always obvious - it's a well-formed JSON object with total_revenue: "approximately high" or a region name the CSV doesn't contain. The framework should catch that and recover, not pass it to your API.

PydanticAI: validation failures feed back to the model

When output_type is a Pydantic model, schema violations don't reach your application code. PydanticAI sends the validation error back to the model and retries:

from pydantic_ai import Agent, ModelRetry

agent = Agent(
    "anthropic:claude-sonnet-4-6",
    output_type=SalesAnswer,
    retries=3,
)

@agent.output_validator
def must_have_revenue(ctx, output: SalesAnswer) -> SalesAnswer:
    if output.total_revenue <= 0:
        raise ModelRetry("Revenue must be positive. Call query_sales_csv and retry.")
    return output

retries=3 handles structural failures - wrong types, missing fields, malformed JSON. The @output_validator handles business rules the schema can't express. Both paths raise ModelRetry, which tells the agent to try again with the error message as context. If all retries exhaust, you get an explicit exception - not a silent bad record in your database.

LangChain: you assemble the loop

LangChain can validate structured output via response_format, but retry orchestration is still your code. Schema errors, business rules, retry limits, and message history - you wire it together:

from pydantic import ValidationError
from langchain.agents import create_agent

MAX_RETRIES = 3

agent = create_agent(
    "anthropic:claude-sonnet-4-6",
    tools=[query_sales_csv],
    response_format=SalesAnswer,
)

messages = [{"role": "user", "content": "What was Q1 revenue in Europe?"}]
for attempt in range(MAX_RETRIES):
    result = agent.invoke({"messages": messages})
    try:
        answer = SalesAnswer.model_validate(result["structured_response"])
        if answer.total_revenue <= 0:
            raise ValueError("Revenue must be positive")
        break
    except (ValidationError, ValueError) as e:
        messages = result["messages"] + [
            {"role": "user", "content": f"Validation failed: {e}. Try again."}
        ]
else:
    raise RuntimeError("Max retries exceeded")

LangChain also offers ToolStrategy with a handle_errors parameter for schema-level retries - closer to PydanticAI's defaults. But business-rule validation like total_revenue > 0 still lands in your loop. And teams using no response_format pattern write even more of this by hand: parse JSON from message text, catch OutputParserException, append errors, track attempts.

The operational difference: PydanticAI treats "output didn't validate" as a normal agent loop event. LangChain treats it as an exception you handle - if you remembered to write the handler.


Testing Without Hitting the API

In the previous sections, you saw how to test agent logic. This section is about testing without paying for it. Every CI run that calls a real LLM costs money, adds latency, and flakes when the model paraphrases. Most teams know this; fewer build around it.

PydanticAI ships test doubles for the model itself. TestModel replaces the LLM with deterministic Python: it calls your tools, generates schema-valid output, and never hits the network. FunctionModel goes further - you write the mock responses in plain Python when you need specific behavior.

from pydantic_ai.models.test import TestModel

def test_sales_agent_returns_validated_output():
    with agent.override(model=TestModel()):
        result = agent.run_sync(
            "What was Q1 revenue in Europe?",
            deps=FakeSalesData(revenue=1_250_000),
        )
    assert isinstance(result.output, SalesAnswer)
    assert result.output.total_revenue > 0

agent.override(model=TestModel()) swaps the model at the agent boundary - the same boundary where you pass deps= in previous sections. Your application code doesn't change; your test doesn't need an API key.

LangChain: script every model turn

LangChain's equivalent is GenericFakeChatModel - you pass an iterator of scripted AIMessage responses, one per model invocation in the agent loop:

from langchain_core.language_models.fake_chat_models import GenericFakeChatModel
from langchain_core.messages import AIMessage, ToolCall

fake_model = GenericFakeChatModel(messages=iter([
    AIMessage(content="", tool_calls=[
        ToolCall(name="query_sales_csv", args={"region": "Europe"}, id="call_1"),
    ]),
    AIMessage(content='{"region": "Europe", "total_revenue": 1250000.0}'),
]))

with patch("myapp.sales_agent.data_source", FakeSalesData(revenue=1_250_000)):
    test_agent = create_agent(
        fake_model, tools=[query_sales_csv], response_format=SalesAnswer,
    )
    result = test_agent.invoke({"messages": [{"role": "user", "content": "Q1 Europe?"}]})
    assert SalesAnswer.model_validate(result["structured_response"]).total_revenue > 0

Here you're scripting each turn: first a tool call, then a JSON payload. Add a retry path or a second tool and you extend the iterator. Miss a turn and the test fails with an opaque StopIteration.

For teams running agent tests on every PR, that's the single biggest cost-saving win I see when moving to PydanticAI: tests on every commit, no API bill, assertions on typed output.


Where LangChain Still Wins

This article argues for PydanticAI on production grounds. That case is weaker if you're not heading to production yet - and LangChain earns its popularity for good reasons.

Integration breadth: LangChain connects to more data sources, vector stores, and model providers out of the box. If you need a connector that PydanticAI doesn't ship yet - a niche CRM, an internal protobuf service, a legacy search index - LangChain's community has probably already built it.

Prototyping speed: Pre-built chains, LangGraph templates, and copy-paste tutorials get a demo in front of stakeholders fast. For a two-week proof of concept where the goal is "show the CEO something that talks", that velocity matters more than typed outputs.

Ecosystem maturity: More Stack Overflow answers, more LangSmith integrations, more hiring-market familiarity. If your team already knows LangChain and the project might not ship, switching frameworks adds cost with no payoff.

None of this makes LangChain the wrong choice for production - teams ship reliable LangChain agents every day. But they do it by adding the validation, testing, and error-handling layers this article shows PydanticAI includes by default. If you're building those layers yourself anyway, the framework choice matters less.

Choose LangChain when speed and integrations beat type safety. Choose PydanticAI when you're done demoing and need the agent to run without you in the room.


Decision Checklist

Bring this to your next architecture review. Seven yes/no questions - answer for the project you're actually building, not the demo you already shipped:

  1. Do you need typed outputs your backend can trust without re-validation?
  2. Will this agent run unattended in production?
  3. Do you need to unit-test agent logic in CI without API calls?
  4. When the model returns malformed output, should the framework retry automatically?
  5. Do you need to trace which document or tool call produced a given answer?
  6. Will static analysis catch breaking changes to your output schema before deploy?
  7. Do you need a niche integration that only exists in LangChain's ecosystem today?

Scoring: Three or more yes on questions 1–6 → lean PydanticAI. Mostly no, or yes only on question 7 → LangChain is fine for now. Yes on 1–3 but no on 4–6 → you're heading to production but haven't built the hard parts yet. Either framework works, but budget time for the gaps this article mapped in sections 3–6.

For a full assessment across your codebase - not just the framework choice - see the Production Readiness Audit. Same categories, applied to what you've actually shipped.


Conclusion

If the checklist pointed you toward PydanticAI but you're already on LangChain, you don't need a rewrite.

Your tools - the functions that query databases, search knowledge bases, and call internal APIs - are plain Python. Wrap them as PydanticAI @agent.tool handlers and migrate one agent at a time. Run both frameworks side by side during the transition; retire LangChain paths as each agent passes the production tests you couldn't write before.

The framework decision is one input. The harder question is whether your agent stack is ready for what happens after the demo - the invented menu options, the silent validation gaps, the CI runs that still hit the API.

Book a 20-minute intro call and tell me what you're building. Or jump straight to an Architecture Call or an AI Code Health Check.


Follow me on Twitter: https://twitter.com/DevAsService

Follow me on Instagram: https://www.instagram.com/devasservice/

Follow me on TikTok: https://www.tiktok.com/@devasservice

Follow me on YouTube: https://www.youtube.com/@DevAsService