惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
爱范儿
爱范儿
WordPress大学
WordPress大学
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
罗磊的独立博客
S
SegmentFault 最新的问题
V
V2EX
V
Visual Studio Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
美团技术团队
博客园 - 三生石上(FineUI控件)
Stack Overflow Blog
Stack Overflow Blog
Y
Y Combinator Blog
MyScale Blog
MyScale Blog
D
Docker
Google DeepMind News
Google DeepMind News
Blog — PlanetScale
Blog — PlanetScale
M
Microsoft Research Blog - Microsoft Research
Martin Fowler
Martin Fowler
S
Secure Thoughts
B
Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
C
Cisco Blogs
C
CERT Recently Published Vulnerability Notes
T
True Tiger Recordings
GbyAI
GbyAI
P
Proofpoint News Feed
P
Privacy International News Feed
Jina AI
Jina AI
The Cloudflare Blog
I
Intezer
AWS News Blog
AWS News Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
S
Security Archives - TechRepublic
NISL@THU
NISL@THU
The Register - Security
The Register - Security
Recent Commits to openclaw:main
Recent Commits to openclaw:main
P
Palo Alto Networks Blog
S
Schneier on Security
L
LINUX DO - 热门话题
C
CXSECURITY Database RSS Feed - CXSecurity.com
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA

DEV Community

An open source LLM eval tool with two independent quality signals Using Dashboard Filtering to Get Customer Usage in Seconds from TBs of Data Skills, Java 17, And Theme Accents 4 Hard Lessons on Optimizing AI Coding Agents Arctype: Cross-Platform Database GUI for LLM Artifacts Your robots.txt says GPTBot is welcome. Your server says 403. Organizing How to Use AWS Glue Workflow 5 n8n Automations Every Digital Agency Should Be Running (Bill More, Work Less) Getting Started with TorchGeo — Remote Sensing with PyTorch Designing a Scalable Cross-Platform Appium Framework Google Antigravity 2.0 & Slash Commands Building a Unified Adaptive Learning Intelligence with Gemma 4, Flutter, and Multi-Model Orchestration Looking for beta testers for a £60 server management application The Disk-Pressure Incident That Taught Me to Always Set LimitRanges and Other Lessons from Mirroring EKS Locally. Why AI Should Not Write SQL Against ERP Databases Vibe coding works until it doesn't. The debt is real. Shipping at the Edge: Migrating a Coffee Subscription Platform to Cloudflare Workers Stop Tab-Switching: A Developer's Guide to Color Tools That Actually Fit the Workflow DevOps vs MLOps vs AIOps: What Changes, What Stays, and a Simple Roadmap to Get Started Run Powerful AI Coding Locally on a Normal Laptop 5 n8n Automations Every WooCommerce Store Needs (Save 10+ Hours/Week) What I Learned Building My Own AI Harness Hytale Servers Will Fail Treasure Hunts Until We Fix Our Event Handling Redux in React: Managing Global State Like a Pro Unfreezing Your GitHub Actions: Troubleshooting Stuck Deployments and Protecting Your Git Repo Statistics Unlocking Project Discoverability on GHES: A Key to Software Engineering Productivity When the Cleanup Code Becomes the Project Rockpack 8.0 - A React Scaffolder Built for the Age of AI-Assisted Development Mismanaging the Treasure Hunt Engine in Hytale Servers Will Get You Killed Why Hardcoded Automations Fail AI Agents Stop Calling It an AI Assistant. It’s Already Managing Your Company Why I built a post-quantum signing API (and why JWT is on borrowed time) Weekend Thought: Frontend Build Tools Suffer From Work Amnesia A 10-Line Playwright Trick That Saved Me Hours on Every Sephora Run AI Is Changing Engineering Culture More Than We Realize Everyone Was Focused on Gemini, But Infinite Scaler Was the Real Twister "Gemma 4 Analyzed My Bank Statements – Apparently I 'Have a Problem' with Coffee and Late-Night Apps" #css #webdev #beginners #codenewbie The Hidden Layer Every AI Developer Must Learn AlphaEvolve: Google DeepMind's Gemini-Powered Evolutionary Coding Agent RDS Reserved Instance Pricing: Every Engine, Every Rule, Real Dollar Savings How To Build An AI-Powered MVP Without Burning Your Startup Budget In 2026 Reading a Psychrometric Chart Without Getting Lost LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025) How to turn text into colors (without AI) Building Real-Time Apps in Node.js with Rivalis: WebSockets, Rooms, Actors, and a Binary Wire This Week In React #282 : Security, Fate, TanStack, Redux, Jotai | Hermes-node, Expo, Rozenite, Harness | TC39, Bun, pnpm, npm, Yarn, Node AI Copilot vs AI Agent Architecture - What's Actually Different (And Why It Matters) Smart Contract Security: NEAR's Futures Surge and AI Token Risks Database Maintenance: Tracing Production Incidents to Their Root Cause Stop juggling AI SDKs in PHP — meet Prisma Google Quietly Changed What “Apps” Mean at I/O 2026 The Infrastructure Team Is the Real Single Point of Failure Building SQLite from Scratch: 740 Lines of C++23 to Understand Every Byte of a .db File The 4 Levels of Hermes Agent Scaling Framework: From One Hermes Agent to a Fully Automated Team Your AI Has a Memory. It Just Doesn’t Know What to Remember. Claprec: Engineering Tradeoffs - Limited time vs. Perfection (6/6) Building a Daily Google News API Monitor in Python Building RookDuel Avikal: From Chess Steganography to Post-Quantum Archival Security Google I/O e IA: o que realmente muda na vida do dev? Color Contrast Failures: The Number One Accessibility Issue and How to Fix It # I Watched 15 Hours of Hermes Agent Videos So You Don't Have To Cómo solucionar el bucle infinito en useEffect con objetos y arrays en React The First Agent-Centric Cloud Security Platform — And Why We Didn't Build It That Way On Purpose Most Treasure Hunts Engines on Hytale Servers Are Built to Fail - Lessons from a Burned Database GhostScan v3.0 — From Closed-Source EXE to Open-Source Pentest Framework De hojas de cálculo a IA: construyendo una plataforma SRM moderna When is AI fine in education? Python Tools for Managing API Rate Limits in Data Pipelines How to Implement Exponential Backoff for Rate-Limited APIs in Python "My Web Chat Wasn't a Real Channel. That Broke My Agent Pipeline" next-advanced-sitemap v1.0.7 — safer URL ingestion & automatic trimming for Next.js sitemap generation I keep seeing people build an AI lead processing agent when they really need a 6-step rules engine AI Powered Student Learning Assistant Using Gemma 4 How I Built a Drop-In Proxy to Slash My OpenAI Bills by 20%+ Automatically Building a Sarcastic AI English Tutor with Persona-as-Code and Gemini Audio Input for Pronunciation Correction Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture Cutting LTX-2 22B Peak VRAM by 40% with fp8_cast — and Why optimum-quanto Was a Trap HiDream Skeleton Mode: Prompt Beats OpenPose Ref — 8 Patterns Benchmarked Replicating a Language-Learning Comedy Short with Claude Code — Gemini as a Multimodal Sub-Agent HiDream-O1-Image 3–8x Faster: Benchmarking Steps, CFG, and Resolution AWS Savings Plan Buying Strategy: How to Layer, Size, and Time Commitments application.properties I built a macro tracker powered by AI + attitude Solace: A Global Mental Health First Responder Built with Gemma 4 Why Blocking Prompt Injection Is Wrong — and What to Do Instead The AI code tools Dutch developers actually use in 2026 (field notes) Automatic Error Recovery in AI Agent Networks You Are Not Choosing Building a Cinematic Adaptive Learning Intelligence with Gemma 4, Gemini, and OpenAI(Powered by Gemma 4) CLAUDE.md for Angular: 13 Rules That Make AI Write Idiomatic, Production-Ready Components I tested 7 vector databases for my RAG stack in 2026, here's the one nobody is talking about (yet) Claude agreed with a false fact I gave it. Confidently. That broke my workflow Google's "Budget" Model Just Beat Its Own Flagship. Here's What That Actually Means for Developers. How I built a monitoring SaaS for Joomla, WordPress & PrestaShop agencies Shifting from Passive Dashboards to Automated Remediation: A Guide to Next-Generation FinOps and CloudZero Alternatives Automating CSV WooCommerce Imports Without Plugins Why Wobbly Plugs and Overheating Outlets Are More Dangerous Than You Think (UL 498 Explained)
Structured LLM Outputs with Pydantic v2: Stop Parsing Freeform JSON and Start Typing Your AI
Peyton Green · 2026-05-19 · via DEV Community

The biggest source of subtle bugs in AI applications isn't the model — it's the gap between what you asked for and what you got.

You prompt for {"score": 8, "issues": ["missing error handling"]} and you get {"score": "8/10", "issues": "missing error handling"}. Both are technically valid JSON. One breaks your downstream code. Neither triggers an exception until hours later when you're wondering why the aggregation is wrong.

Pydantic v2 eliminates this class of bugs. Here's how to structure your LLM outputs so type errors are caught at the boundary, not buried in production.


The problem with freeform JSON parsing

Most developers start here:

import json
from anthropic import Anthropic

client = Anthropic()

def analyze_code(code: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Analyze this code and return JSON with: severity (int 1-10), issues (list of strings), has_security_risk (bool).\n\n{code}"
        }]
    )
    return json.loads(response.content[0].text)

Enter fullscreen mode Exit fullscreen mode

This fails in three ways you won't notice until production:

  1. Type coercion silently wrong. The model returns "severity": "8" instead of 8. json.loads parses it as a string. Your downstream severity > 7 comparison evaluates to False for every input.

  2. Missing fields. The model occasionally omits has_security_risk when it seems obvious from context. KeyError three calls in, two hours into a batch job.

  3. Schema drift. You update the prompt. The model starts returning an extra field. Your downstream code ignores it. A week later you realize the data you've been storing is inconsistent.


The Pydantic v2 fix

Define your output schema first:

from pydantic import BaseModel, Field, field_validator
from typing import Annotated

class CodeAnalysis(BaseModel):
    severity: Annotated[int, Field(ge=1, le=10)]
    issues: list[str]
    has_security_risk: bool
    summary: str = ""  # optional with default

    @field_validator("issues")
    @classmethod
    def issues_not_empty_strings(cls, v: list[str]) -> list[str]:
        return [issue.strip() for issue in v if issue.strip()]

Enter fullscreen mode Exit fullscreen mode

Now parse with validation:

def analyze_code(code: str) -> CodeAnalysis:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"""Analyze this code. Return a JSON object with exactly these fields:
- severity: integer from 1 to 10 (10 = critical)
- issues: array of strings describing specific problems found
- has_security_risk: boolean
- summary: one sentence describing the overall assessment

Code:
{code}"""
        }]
    )

    raw = extract_json(response.content[0].text)
    return CodeAnalysis.model_validate(raw)

Enter fullscreen mode Exit fullscreen mode

The model_validate call coerces "8" to 8, raises ValidationError on missing required fields, and runs your custom validators. The error surfaces at the boundary, not downstream.


Extracting JSON from model responses

Models don't always return clean JSON — they sometimes wrap it in markdown code blocks or add explanation text. A reliable extractor:

import re

def extract_json(text: str) -> dict:
    """Extract JSON from model response, handling markdown code blocks."""
    # Try markdown code block first
    match = re.search(r"```

(?:json)?\s*(\{.*?\})\s*

```", text, re.DOTALL)
    if match:
        return json.loads(match.group(1))

    # Try raw JSON object
    match = re.search(r"\{.*\}", text, re.DOTALL)
    if match:
        return json.loads(match.group(0))

    raise ValueError(f"No JSON found in response: {text[:200]}")

Enter fullscreen mode Exit fullscreen mode

This handles the three most common response formats:

  • {"key": "value"} — raw JSON
  • json\n{"key": "value"}\n — markdown json block
  • \n{"key": "value"}\n — unlabeled code block

Prompt patterns that produce consistent schema adherence

The prompt matters as much as the parser. Patterns that reduce schema drift:

Explicit field types in the prompt:

Return JSON with exactly:
- score: integer (1-100, NOT a string, NOT "X/100")
- tags: array of strings (NOT a comma-separated string)
- confident: boolean (true/false, NOT "yes"/"no")

Enter fullscreen mode Exit fullscreen mode

Spelling out "NOT a string" sounds redundant. It cuts type coercion errors by ~80% in practice.

Repeat the schema in the system prompt:

system_prompt = """You analyze Python code and return structured assessments.

ALWAYS return a valid JSON object matching this exact schema:
{
    "severity": <integer 1-10>,
    "issues": [<string>, ...],
    "has_security_risk": <boolean>,
    "summary": <string>
}

Never include markdown formatting. Never add extra fields. Never omit required fields."""

Enter fullscreen mode Exit fullscreen mode

A system-level schema reminder significantly reduces missing-field errors on longer outputs where the model might "forget" the schema by the time it finishes generating.

Temperature for structured outputs:

For strict schema adherence, use lower temperature (0.2-0.4). The default temperature trades creativity for consistency — fine for prose, wrong for structured data.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    temperature=0.3,  # deterministic enough for reliable JSON
    ...
)

Enter fullscreen mode Exit fullscreen mode


Handling validation errors gracefully

Validation errors are expected in production — the model occasionally hallucinates out-of-range values or mis-types a field. Don't let them crash your application:

from pydantic import ValidationError
import logging

logger = logging.getLogger(__name__)

def analyze_code_safe(code: str) -> CodeAnalysis | None:
    try:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            temperature=0.3,
            messages=[...],
        )
        raw = extract_json(response.content[0].text)
        return CodeAnalysis.model_validate(raw)

    except ValidationError as e:
        logger.warning(
            "Schema validation failed",
            extra={"errors": e.errors(), "code_snippet": code[:100]}
        )
        return None

    except (ValueError, json.JSONDecodeError) as e:
        logger.error("JSON extraction failed", extra={"error": str(e)})
        return None

Enter fullscreen mode Exit fullscreen mode

Log the validation errors — e.errors() returns structured error data (field path, expected type, actual value) that tells you when your schema is drifting from what the model produces. Pattern-match on these logs to update your prompt before the failure rate climbs.


Nested schemas

For complex outputs, compose Pydantic models:

from pydantic import BaseModel
from typing import Literal

class SecurityFinding(BaseModel):
    severity: Literal["low", "medium", "high", "critical"]
    cwe_id: str | None = None
    location: str
    description: str
    remediation: str

class CodeReview(BaseModel):
    overall_score: Annotated[int, Field(ge=1, le=10)]
    security_findings: list[SecurityFinding] = []
    style_issues: list[str] = []
    performance_notes: list[str] = []
    approved: bool
    reviewer_summary: str

Enter fullscreen mode Exit fullscreen mode

Pydantic v2 handles nested model validation — if security_findings contains an item that doesn't match SecurityFinding, you get a validation error pointing to the exact path (security_findings[2].severity).

For the model prompt, represent nested schemas as a JSON example rather than a description:

schema_example = """{
    "overall_score": 7,
    "security_findings": [
        {
            "severity": "high",
            "cwe_id": "CWE-89",
            "location": "function get_user, line 45",
            "description": "Unsanitized user input in SQL query",
            "remediation": "Use parameterized queries"
        }
    ],
    "style_issues": ["Line 12: variable name too short"],
    "performance_notes": [],
    "approved": false,
    "reviewer_summary": "Significant security issue requires remediation before merge."
}"""

Enter fullscreen mode Exit fullscreen mode

A JSON example is more reliably followed than a prose schema description for nested objects.


Streaming with structured outputs

For long outputs where you want to stream but still validate:

import json

def analyze_code_streaming(code: str) -> CodeAnalysis:
    chunks = []

    with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        temperature=0.3,
        messages=[...],
    ) as stream:
        for text in stream.text_stream:
            chunks.append(text)
            # optionally yield chunks to caller here

    full_response = "".join(chunks)
    raw = extract_json(full_response)
    return CodeAnalysis.model_validate(raw)

Enter fullscreen mode Exit fullscreen mode

Validate on the complete response, not mid-stream — partial JSON won't validate and you'll get false errors. Stream for latency perception; validate at the end for correctness.


A complete working pattern

Here's the full pattern assembled, ready to adapt:

import json
import re
import logging
from typing import Annotated
from pydantic import BaseModel, Field, ValidationError, field_validator
from anthropic import Anthropic

logger = logging.getLogger(__name__)
client = Anthropic()

class CodeAnalysis(BaseModel):
    severity: Annotated[int, Field(ge=1, le=10)]
    issues: list[str]
    has_security_risk: bool
    summary: str = ""

    @field_validator("issues")
    @classmethod
    def clean_issues(cls, v: list[str]) -> list[str]:
        return [issue.strip() for issue in v if issue.strip()]

def extract_json(text: str) -> dict:
    match = re.search(r"```

(?:json)?\s*(\{.*?\})\s*

```", text, re.DOTALL)
    if match:
        return json.loads(match.group(1))
    match = re.search(r"\{.*\}", text, re.DOTALL)
    if match:
        return json.loads(match.group(0))
    raise ValueError(f"No JSON found in response")

def analyze_code(code: str) -> CodeAnalysis | None:
    try:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            temperature=0.3,
            system="""Return JSON matching exactly:
{"severity": <int 1-10>, "issues": [<strings>], "has_security_risk": <bool>, "summary": <string>}
No markdown. No extra fields.""",
            messages=[{"role": "user", "content": f"Analyze:\n\n{code}"}],
        )
        raw = extract_json(response.content[0].text)
        return CodeAnalysis.model_validate(raw)

    except ValidationError as e:
        logger.warning("Validation failed", extra={"errors": e.errors()})
        return None
    except Exception as e:
        logger.error("Analysis failed", extra={"error": str(e)})
        return None

Enter fullscreen mode Exit fullscreen mode


What this gives you that freeform parsing doesn't

  • Type safety end-to-end. analysis.severity is always int. Your type checker knows it. Your IDE autocompletes it.
  • Validation at the boundary. Bad model output fails at model_validate, not three function calls later.
  • Structured error logging. ValidationError.errors() tells you which field, which constraint, which value. Useful for monitoring model drift over time.
  • Schema as documentation. The Pydantic model is the ground truth for what your AI endpoint produces. CodeAnalysis.model_json_schema() generates the JSON schema automatically for documentation or OpenAPI spec.

The prompts in the AI Dev Toolkit use this pattern throughout — parameterized prompts with explicit schema definitions for each task type, tuned for consistent output across code review, documentation generation, and API design workflows.


Further reading


If structured AI output patterns are a repeated part of your Python workflow, the AI Dev Toolkit includes 80+ parameterized prompts for code review, documentation, API design, and debugging — each built around consistent schema output rather than freeform responses.