惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

DEV Community

98. RAG: Give Your AI Access to Your Documents Why Getting a Tech Job Right Now Feels Broken? The Container Runtime Nobody Told You About (And Four Others) The Singleton Labyrinth Build your first MCP server in TypeScript: the 2026 setup that takes 30 minutes. Check Wallet Balances Across 4 Chains with Zero Dependencies — chain_balance.py Vectr — Code Intelligence AI Tool Veltrix Was Killing Us With YAML Beyond Monolithic AI: How to Build a Pluggable "Brain" Architecture for Autonomous Agents 5 PostgreSQL locking behaviors that trip people up The Operational Cost of JWT Lifecycle Management: Overlooked Details ATR Implements the Detection Layer the NSA Identified as Missing in MCP I tried both Cursor and Antigravity(1.20) - Switching Context - which one is better? Negative Lookups in Bf-Tree: Caching Things That Don't Exist My Struggles as a Software Engineer in 2026 Why Hybrid Metaheuristics Still Beat “Smarter” AI in Real-World Optimization Cómo destacar como JR DEV en tu equipo I got tired of guessing which model holds my VRAM, so I built a tiny dashboard Qwen Is Not Yet Ready to Power Local OpenClaw Deployments Top 7 Featured DEV Posts of the Week Why I got frustrated with AI job search tools and built my own 10 Best Open-Source AI Agents for 2026 Contract Analysis Will Replace Legal Gatekeeping AWS Cloud Shell with Antigravity CLI Building Reliable Event Delivery for XRPL Applications AMTP: HTTP for the Agentic Web — A New Markdown-First Protocol for AI Agents LLM Security Vulnerabilities Engineers Need to Know in 2026 Shared Build Cache: Makes Sense for the Independent Developer? Live Lessons From Running a 5-Minute Polymarket Crypto Bot Cómo Evaluar Agentes IA: Tutorial de LLM-as-Judge Day 2 of Python Learning 🐍 I built a local-first Apple Health recovery briefing that shows its math I Built a REST Microservice With a Database in 3 Files — and Wrote Zero Code 10 Avro Schema Mistakes Even Experienced Developer Do Commit: Refactor background workers and logging pipeline GitHub Actions vs Jenkins vs GitLab CI: A Developer's Honest Comparison (2026) Clean Architecture in MongoDB + C#: Why is the Repository Pattern Alone Not Enough? I Tested 10 More Models. Five Brand New Families Debuted. None Scored Below 75%. I Almost Quit Coding to Become a Welder Understanding Reinforcement Learning with Human Feedback Part 6: How the Reward Model Trains the Original Model # Level Up Your Portfolio with Wowfolio.in: Free, Customizable, Type Inhabitation in Lean: Why “Hello {name}” Can Become a Theorem Mastering Context in Go: A Senior Engineer’s Playbook for Lifecycle Management Solana Transactions Through a Backend Developer’s Eye Agent as a Tool Call: Claude Code's Fork-Exec Pattern How I wired Stripe subscriptions to Supabase in Next.js 15 (the parts tutorials skip) Introduction to A2A and Agent Search Why Doesn't Linux Break Every Week? The "AI" Label Is Losing Its Meaning, and Companies Are the Ones Diluting It Bucky Fuller's To-Do List: Can AI Finally Solve the World's Cataloged Problems? My $10/Month VPS Gets 659 SSH Attacks per Day — Here's What 4 Weeks of Running an Autonomous AI Has Taught Me About Infrastructure Speed Up Your WordPress Site in 30 Minutes: A No-Plugin Performance Guide Breaking Code: The Addiction Nobody in Tech Will Admit To Nobody Reads AI Safety Papers. But 649 People Upvoted a Letter to an LLM. The Pope wrote about me Je vibe-coded app werkt. Maar kan hij ook live? The Event Store That Survived Black Friday Without a Single 5xx Audit-trail-by-construction: a thesis for spec-driven AI coding Day 8 - Sparse embedding - RAG How we made our Mac launcher feel instant by killing slow providers How we made our Mac launcher feel instant by killing slow providers Enterprise AI Agent Orchestration Patterns How to build your first MCP server in 10 minutes Claude Code's plan mode is prompt engineering, not hard enforcement Built a C# AI Agent That Researches Errors and Suggests Fixes From Shell Scripts to MCP Servers: How SEO Broke My Brain (in a Good Way) AI Agent Platform Buyer's Guide: 12 Questions to Ask Before You Sign 🦋 I Built a Living Terminal Animation with Hermes Agent — Here's How It Went. AI Agents Are Coming for Your WordPress Admin Panel, and That's Not a Bad Thing Tailscale + k3s in a 2‑node homelab: why I use Tailscale ONLY for the control plane When NOT to Use AI Agents: A Realistic Framework Human-in-the-Loop Patterns for High-Stakes AI Agent Decisions LLM Cost Optimization for Agent Workflows: A Practical Guide An Evolving Strategy for Knowledge Work: From Human-In-the-Loop to Human-Before-the-Loop Why I Wake Up at 5am to Run (And Why You Might Want To) I Scanned 260 Packages that your are using and Found 43 With Security Vulnerabilities The Easiest Way to Implement Theme Toggling in React 19 using next-themes & Tailwind CSS v4 AI skill testing: yes, your prompts need regression tests Why We Built AnToAnt: Designing Software Before Writing Code How I Built an End-to-End HR Attrition Dashboard Using MySQL & Power BI Why Hytale Treasure Hunt Engines Stumble Before 1,000 Concurrent Diggers: What Veltrix Does Not Document How to Implement Dark/Light Mode with No Flickers in Next.js Building My First Solana Transfer CLI Tool | #100DaysOfSolana What Is OAuth Token Exchange? CLI wrapper for Cloudflare Tunnel with Zero Trust Your Agent Acts Without Checking Your Error Budget — That's the Failure Mode Nobody Is Tracking The Death of the Junior Developer Is Greatly Exaggerated How I Built a Programmatic SEO Site with 16,750 Pages Using FastAPI and PostgreSQL Toward a Standard Model for Agent Memory I Applied SLA Concepts to My Email Inbox — Here's What I Learned Building the Chrome Extension How Spring Data JPA, JPA, and Hibernate work together What useOptimistic Actually Saves You The Vibe Tax: How Unvalidated AI Code Is Flooding the Market and Driving Up Technical Debt Building My First MCP Server with Claude and Python Azure Blob Storage for Beginners: Private Access, SAS Tokens & Cost Savings Explained I'm building a TypeScript data grid where config reads like English Revamped Proof for Finish-Up-A-Thon Selectors and its uses in HTML & CSS Bronto for Fastly: Real-Time CDN Logging That Actually Scales I Built a Local Interview Coach That Learns From Every Submission With Hermes Agent.
Mastering Structured JSON Outputs with Gemini API
Ebendttl · 2026-05-27 · via DEV Community

This is an excerpt. The full article includes a live interactive schema sandbox where you can switch between 3 real constraint schemas and watch the Gemini inference engine stream constrained tokens in real time. Read the full interactive version →


The Problem: LLMs Are Eloquent, Not Predictable

Language models are optimized to be helpful communicators. This is precisely what makes them powerful interfaces for humans — and extraordinarily fragile integrations for software architectures.

Consider a simple extraction request:

"Extract the product name, price, and availability from the following text and return it as JSON."

Enter fullscreen mode Exit fullscreen mode

Under testing, the model returns a clean JSON block. But in high-throughput production environments, you'll inevitably hit the model's alignment behaviors:

  • Conversational Padding: "Here is the data you requested: ..."
  • Varying Key Names: One response returns "product_name", another "product", a third "name"
  • Brittle Typings: A numeric price 279.99 becomes the raw string "$279.99"

Your downstream TypeScript classes throw unhandled KeyError exceptions. The execution fails.


Why Regex and Prompt Engineering Will Betray You

The classic fix is prompt escalation:

"Return ONLY a raw JSON object. Do NOT wrap in markdown. NEVER write conversational text."

Enter fullscreen mode Exit fullscreen mode

This reduces failures under small loads — but instruction-following is entirely probabilistic. Under unexpected long-context inputs, the model drifts back to its conversational baseline. In a system handling 50,000 calls/day, a 1% failure rate represents 500 critical errors.

Custom regex parsing is worse. The moment the provider updates their model parameters, your regex silently corrupts production data.


Constrained Decoding: Enforcing Structure at the Inference Layer

Gemini's structured output system works via vocabulary masking during the inference step itself — not post-processing.

When generating a response, the model predicts the probability of every token in its ~32,000+ word vocabulary. Without constraints, it samples freely. When you enforce a JSON Schema contract, Gemini compiles it into a state machine. At every generation step, illegal tokens are masked to exactly zero probability.

If a field expects a number, every text token ("twenty", "$", any alphabet character) is mathematically eliminated. This is not retrying or filtering — it's structural constraint at the neural network's decoding loop.

Standard Decoding Constrained Decoding (Gemini)
"$279.99" → 45% probability "$279.99"0% probability
"279.99" → 40% probability "279.99"100% probability
"in stock" → 15% probability "in stock"0% probability

The Two API Pillars

Activate structured execution with two native parameters:

import { GoogleGenerativeAI, SchemaType } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);

const model = genAI.getGenerativeModel({
  model: "gemini-2.0-flash",
  generationConfig: {
    responseMimeType: "application/json",  // Pillar 1
    responseSchema: {                       // Pillar 2
      type: SchemaType.OBJECT,
      properties: {
        sentiment: {
          type: SchemaType.STRING,
          enum: ["VERY_POSITIVE", "POSITIVE", "NEUTRAL", "NEGATIVE", "VERY_NEGATIVE"]
        },
        csat_risk_score: {
          type: SchemaType.NUMBER,
          description: "0=no risk, 10=certain churn"
        },
        requires_human: { type: SchemaType.BOOLEAN }
      },
      required: ["sentiment", "csat_risk_score", "requires_human"]
    }
  }
});

Enter fullscreen mode Exit fullscreen mode

responseMimeType: "application/json" switches the model from raw string processing to structured mode. responseSchema defines the structural contract the response must satisfy — keys, types, enums, required fields, all of it.


JSON Schema Deep Dive

Enums — The Most Powerful Constraint

Enums force Gemini to select from a hardcoded array of values. This is the single most impactful constraint for classification systems:

{
  "type": "string",
  "enum": ["IN_STOCK", "OUT_OF_STOCK", "BACKORDER"]
}

Enter fullscreen mode Exit fullscreen mode

No hallucinated variants. No "in stock" vs "In Stock" inconsistencies. The schema enforces it at the token level.

Nullable Attributes

{ "type": "string", "nullable": true }

Enter fullscreen mode Exit fullscreen mode

This prevents hallucinated values. If the input text contains no reference to that field, Gemini outputs null rather than inventing data.


The Multi-Stage Orchestration Pattern

For complex documents, never attempt a single massive extraction call. Instead, decompose into modular pipelines:

Raw Document
    ↓
Stage 1: Classification (Schema: DocType)
    ↓
Stage 2A: Invoice Parser  |  Stage 2B: Legal Contract  |  Stage 2C: Receipt Parser
    ↓                              ↓                              ↓
                     Unified Structured Database

Enter fullscreen mode Exit fullscreen mode

Each stage uses a narrow, optimized schema. This reduces cost, increases accuracy, and makes debugging trivial.


Production Validation Layer

Schema enforcement guarantees structural correctness — not logical correctness. Always include downstream validation:

import { z } from "zod";

const SentimentSchema = z.object({
  sentiment: z.enum(["VERY_POSITIVE", "POSITIVE", "NEUTRAL", "NEGATIVE", "VERY_NEGATIVE"]),
  csat_risk_score: z.number().min(0).max(10),
  requires_human: z.boolean()
});

const raw = await model.generateContent(prompt);
const parsed = JSON.parse(raw.response.text());
const validated = SentimentSchema.safeParse(parsed);

if (!validated.success) {
  // Handle structural edge cases gracefully
  console.error("Validation failed:", validated.error);
}

Enter fullscreen mode Exit fullscreen mode

Gemini guarantees output keys exist and types match. It cannot know if a discount value is negative or if invoice line items don't sum to the stated total. Always validate semantic parameters downstream.


Engineering Takeaways

  1. Never rely on instruction-following alone. Probabilistic models will drift. Use structural constraints at the API level.
  2. responseMimeType + responseSchema is the only production-safe pattern for JSON extraction pipelines.
  3. Enums are your most powerful tool — they eliminate entire classes of inconsistency bugs.
  4. Constrained decoding ≠ logical validation. Layer Zod or Pydantic downstream.
  5. Multi-stage pipelines outperform single massive calls for complex document structures.

🔬 The full article includes an interactive Gemini Constraint Engine sandbox — select from 3 real schema contracts (Sentiment Tracker, Invoice Parser, Code Auditor) and watch constrained token streaming in real time. It also covers complex nested schemas, entity extraction patterns, cost/latency optimization, and the future of agentic orchestration.

Read the full interactive article →


Written by Ebenezer Akinseinde — Software Developer & AI Automations Engineer. Building fast, production-grade AI pipelines and distributed frontend systems.

Portfolio · GitHub