惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

The Register - Security
The Register - Security
美团技术团队
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
Jina AI
Jina AI
C
Check Point Blog
aimingoo的专栏
aimingoo的专栏
I
InfoQ
S
Securelist
T
Tor Project blog
GbyAI
GbyAI
L
LINUX DO - 热门话题
V
Visual Studio Blog
AWS News Blog
AWS News Blog
The Cloudflare Blog
腾讯CDC
K
Kaspersky official blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Recorded Future
Recorded Future
李成银的技术随笔
W
WeLiveSecurity
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
M
Microsoft Research Blog - Microsoft Research
G
Google Developers Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
Schneier on Security
Schneier on Security
B
Blog
IT之家
IT之家
爱范儿
爱范儿
H
Help Net Security
Simon Willison's Weblog
Simon Willison's Weblog
NISL@THU
NISL@THU
J
Java Code Geeks
博客园 - 聂微东
T
The Exploit Database - CXSecurity.com
Cyberwarzone
Cyberwarzone
博客园 - 叶小钗
MyScale Blog
MyScale Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Project Zero
Project Zero
F
Future of Privacy Forum
D
Darknet – Hacking Tools, Hacker News & Cyber Security
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Hacker News: Ask HN
Hacker News: Ask HN
D
Docker
Apple Machine Learning Research
Apple Machine Learning Research
B
Blog RSS Feed
V
Vulnerabilities – Threatpost

DEV Community

How to Tailor Your Resume to a Job Description in 5 Minutes (A Method That Actually Works) Flutter vs React Native in 2026: I Built the Same App in Both JWT vs Session Tokens in Spring Boot: A Senior Dev's Decision Guide How to Choose an AI Gateway in 2026 How to Teach Source Evaluation When Your Students Use ChatGPT Why Passwordless B2C Rollouts Stall at 5% (and How to Reach 60%) Rmux Review: Rust Terminal Multiplexer Built for AI Agents I realized I was only using half of what Claude Code has to offer DevOps & Deployment Essentials: Your Practical CI/CD Guide How next-generation captchas work and why it matters for automation What if Everybody Were Suddenly... Better? OCI Web Application Firewall (WAF) Deep Dive: Architecture, Traffic Inspection, Threat Protection, and Enterprise Security Design Selling Digital Products in a Country PayPal Refuses to Touch PostgreSQL backup tool Databasus released backup verification in real database Docker containers We Connected an LLM to a 12-Year-Old Codebase. Here's What Broke. The Fallacy of Digital Platforms: Why Stripe Isn't Always King Sizce Google'ın 26 Mayıs tarihinde arama bölümünü tamamen yapay zekaya devredecek olması açık webin devamı için nasıl sonuçlanır? When Should You Use GraphRAG Instead of RAG? Big Data Is Not Just About “Huge Data” The Prefix Bubble MPP TestKit VSCode Extension - Inline HTTP 402 Payment Flow Hints The README Was a Protocol. The Entrypoint Was Still Optional. After AI Healthcare, Medical World Models May Be the Next Life-Science AI Platform Your AI Agent Doesn't Need an API Key: Entra Agent ID and Anthropic's Workload Identity Federation ECDSA - The Math That Only Goes One Way S3 Files Killed My Least Favorite Lambda Pattern BNB RPC Endpoints for Production Apps and Backend Workloads I Used to Get Excited About New Tools Now I Feel Tired. Google I/O 2026 — What I Hoped to See Beyond the Model Announcements Most 'AI agents' are just scripts with a marketing budget 🚀 Replicating the evasive VoidLink: My Journey Building Cortex C2 # new stuff dropped in duckkit 🦆 Paying the bills in a restricted country with cryptocurrency: the lie that almost killed our digital product Building Global Economies Through Better APIs: Lessons from PayPal vs Crypto for Crypto Payments in Developing Countries Verified or Not? Ep. 2 — Snyk's Own Test App Scanned With 9 Engines 17 SessionAuth Tools in OpenClaw: Integrate Any AI Framework with Wallet Infrastructure WebMCP and the Citation Paradox — What Agent-Ready Websites Actually Mean for GEO What Gemma 4 Doesn't Know About Cameroon — and What That Taught Me About Building AI for the Real World AI Can Generate Code — And Interactive Coding Playgrounds Are Becoming Essential Modern Web Guidance: Teaching AI Agents to Stop Coding Like It's 2019 The Discipline We Forgot We Had I Built a 3-Agent AI Research Crew in 250 Lines of Python (LangGraph + Free Gemini) PostgreSQL MCP: Let Claude query your databases in plain English Building digital products and Android apps under IteraTrail Fuel Price API for Fleet Cost Planning Linux File System Explained Simply Building a shot-detection worker for an upload pipeline with PySceneDetect 0.7 Wiring VMAF (and PSNR) into your encoder CI with FFmpeg 8.1 and ffmpeg-quality-metrics Bikin Chatbot Sendiri yang Bisa Jawab Pertanyaan dari Dokumen kamu Learning Arabic: Where to Start Shipping WebVTT subtitles in HLS that actually stay in sync (a hands-on guide for 2026) Understanding AI Code Fast: A 60-Second Habit for Institutional Memory Building a Real-Time Camera Classifier Chasing Tokens: The Developer Grind Nobody Warned You About A 10th Grader’s Journey: Why Cyber Security Starts with Your Very First Loop Why Most Developer Portfolios Fail to Show Engineering Maturity Agent Loop and Harness: A Practical Engineering View of AI Operations I built Alpha Insights: AI business research with validators, not just prompts Polygon RPC Endpoints: Free, Dedicated, and Production Options BNB Chain RPC Provider Guide for Production Apps What Is a Nonce in Blockchain? Transaction Nonces Explained Testnet RPC Guide: Sepolia, BNB, Solana Devnet, and More Solana Devnet RPC Guide for Builders and QA Teams How to Choose an RPC Provider for Production Web3 Apps Best Hyperliquid RPC Provider for Low-Latency Apps Best Ethereum RPC API for Web3 Apps and Developers Base RPC Provider Guide for Production Web3 Apps New NPM package to add customizable avatar system for react project Building a Customizable Avatar System in React (Without Creating Everything From Scratch) Request-Boundary AI Spend Control in 2026: A Practical Diagnostic for Gateway and FinOps Teams LOCALMIND AI-Offline Learning powered by GEMMA4:E4B-IT The Day AI Became Its Own CTO: Antigravity 2.0 and the 12-Hour OS Magento 2 REST API Performance: Bulk Endpoints, Async Operations & Optimization When Payment Platforms Fail: My Venezuela Nightmare with Digital Creators Vellum — a private, on‑device screenshot assistant powered by Gemma 4 Seasons time-lapse - the foundations How to Measure AI Coding Agents Beyond Lines of Code and PR Acceptance Rates Recruiters do not care about your tools list Building a Monte Carlo Retirement Simulator in Python ShareBox: self-hosted file sharing with video streaming in pure PHP XSLT performance tuning without losing readability Comparing Replication and Failover in PostgreSQL and MongoDB Build a Smart Sport Predictor with Data Science Como Usar Qwen 3.7 Grátis? I turned my daily job hunt into a semi-automated workflow in Cursor. Why Enterprise AI Fails: Fragmented Data, Not Model Choice Automated Crypto Payment and Delivery for Digital Products: A Desperate, Working Solution When Your Country Blocks Google Pay and Apple Pay Your Website Doesn’t Need More Features — It Needs Less Friction I built a browser-based chat UI for Kiro CLI and it complete how I use AI agents The Dark Side of Stripe: Why Traditional Payments Platforms Fail in Every Country Day 07: Wallet Experiments Instruction: how to create a website (HTML file, webpage, or HTML document) Forgelab PDF API Review: Affordable REST API for PDF Merge, Split, and Compress UseState - Exercises The Pope, Anthropic, and the Weight of Rerum Novarum NVIDIA's $81.6B Quarter Confirms the Networking Bottleneck — Here's What Developers Should Know Open Source Software Monetization: How Developers Are Actually Making Money in 2026 Composition over Inheritance in Go: The Design Choice That Makes Microservices Boring in the Best Way Why Stripe Didnt Cut It for Creators in Pakistan — and How We Built a Parallel Pipeline for $0.05 Per Transaction Why Long-Running AI Agents Break on HTTP, and How Ably's Durable Sessions Fix It
Chat is Dead: How JSON Prompting Cut My AI Costs by 73%
CallmeMiho · 2026-05-21 · via DEV Community

I burned $2,400 in 3 weeks talking to AI like a human

For 18 months, I built AI features the "normal" way: conversational prompts, friendly instructions, "please" and "thank you" sprinkled in. It worked—until our user base 10x'd in January.

Our monthly OpenAI bill went from $800 to $4,100. Same features. Same users. Just more conversations.

That's when I discovered JSON prompting. Not as a nice-to-have. As a survival requirement.

Three weeks after migrating our entire stack, our bill dropped to $1,107. A 73% reduction. Here's the exact system.

Why chat interfaces are a tax on engineering

Traditional prompting looks like this:

const prompt = `
  You're a helpful assistant. Please extract the user's name, 
  email, and company from this text. Be polite and return 
  the data in a friendly format.

  Text: ${userInput}
`;

Enter fullscreen mode Exit fullscreen mode

The problems:

  • Unpredictable output: Sometimes JSON, sometimes markdown, sometimes an apology
  • Token bloat: "Please," "helpful," "friendly" = 12 wasted tokens per call
  • Parser hell: JSON.parse() fails 23% of the time (my actual metric)
  • No schema validation: You find out it's broken in production

When you're doing 500K calls/month, those 12 tokens become 6M tokens. At $0.03/1K tokens, that's $180/month for the word "please."

JSON prompting: treating LLMs like APIs

Here's the same task with JSON prompting:

const prompt = {
  "schema": {
    "name": "string",
    "email": "string (valid format)",
    "company": "string"
  },
  "instructions": "Extract from text. Return ONLY valid JSON.",
  "text": userInput
};

const response = await openai.chat.completions.create({
  model: "gpt-4-turbo",
  messages: [{
    role: "user",
    content: JSON.stringify(prompt)
  }],
  response_format: { type: "json_object" }
});

Enter fullscreen mode Exit fullscreen mode

Result: 100% parse success rate. Zero fluff. 34% fewer tokens.

The hidden reason it saves 73% (it's not the tokens)

Everyone focuses on token reduction. That's the small win.

The big win is eliminating retry loops.

With chat prompting, my flow looked like:

  1. Send prompt → get markdown instead of JSON
  2. Retry with "return ONLY JSON" → get JSON with comments
  3. Retry again → finally get clean JSON
  4. Parse → crash on edge case
  5. Add try/catch → retry entire flow

Average: 2.7 API calls per successful extraction.

With JSON prompting + response_format:

  1. Send prompt → get guaranteed JSON
  2. Parse → works

Average: 1.0 calls.

That's a 63% reduction in API calls before token savings. Combined with schema efficiency: 73% total cost cut.

The reasoning token trap

Here's what nobody tells you about "thinking models" (o1, Claude 3.7, Gemini 2.0):

When you enable reasoning, you're billed for internal thoughts at input rates.

I pasted 500K tokens of codebase for analysis. The model used 187K "reasoning tokens" to think about it. My bill: $18.40 for thinking, $15 for the answer.

JSON prompting forces deterministic reasoning. The model doesn't "think" in prose—it maps directly to your schema. My reasoning token usage dropped 81%.

// Before: 500K context + 187K reasoning = $18.40
// After: 500K context + 35K reasoning = $6.20

Enter fullscreen mode Exit fullscreen mode

Migration: 3 files changed

Step 1: Define schemas (schemas.js)

export const schemas = {
  userExtraction: {
    type: "object",
    properties: {
      name: { type: "string" },
      email: { type: "string", format: "email" },
      company: { type: "string" }
    },
    required: ["name", "email"]
  }
};

Enter fullscreen mode Exit fullscreen mode

Step 2: Create prompt builder (prompt.js)

export const buildPrompt = (schema, data) => ({
  schema,
  data,
  instruction: "Return ONLY valid JSON matching schema. No markdown."
});

Enter fullscreen mode Exit fullscreen mode

Step 3: Update API calls

// Old
const completion = await openai.chat.completions.create({
  messages: [{ role: "user", content: chattyPrompt }]
});

// New
const completion = await openai.chat.completions.create({
  messages: [{ 
    role: "user", 
    content: JSON.stringify(buildPrompt(schema, data))
  }],
  response_format: { type: "json_object" },
  temperature: 0 // Critical for determinism
});

Enter fullscreen mode Exit fullscreen mode

Total migration time: 4 hours for 47 endpoints.

The results after 21 days

Metric Before After Change
Avg tokens/call 1,240 820 -34%
Parse failures 23% 0% -100%
Avg calls/task 2.7 1.0 -63%
Monthly cost $4,100 $1,107 -73%
P95 latency 2.3s 1.1s -52%

Bonus: Our error rate dropped from 1.2% to 0.03%. Support tickets about "AI acting weird" went to zero.

When NOT to use JSON prompting

  • Creative writing (you want the fluff)
  • Exploratory analysis (you want reasoning prose)
  • Customer-facing chat (humans like "please")

For everything else—data extraction, classification, transformation, API-like tasks—JSON prompting is highly effective.

The stack is deterministic

The era of "prompt engineering as conversation" is shifting. We are entering a phase where prompt engineering functions more like API design.

Your prompts are schemas. Your LLMs are functions. Your costs are predictable.

Start with one endpoint this week and measure the before/after. The savings may vary depending on your specific use case.

What's your biggest AI cost surprise? I'm collecting data for a follow-up on reasoning token optimization. Drop your numbers below.