惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
博客园_首页
H
Hackread – Cybersecurity News, Data Breaches, AI and More
T
ThreatConnect
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - 聂微东
H
Help Net Security
T
Threat Research - Cisco Blogs
Blog — PlanetScale
Blog — PlanetScale
A
Arctic Wolf
G
Google Developers Blog
量子位
U
Unit 42
I
InfoQ
V
V2EX
F
Fox-IT International blog
P
Privacy & Cybersecurity Law Blog
V
Visual Studio Blog
J
Java Code Geeks
大猫的无限游戏
大猫的无限游戏
C
CERT Recently Published Vulnerability Notes
博客园 - 三生石上(FineUI控件)
T
The Exploit Database - CXSecurity.com
T
Tailwind CSS Blog
SecWiki News
SecWiki News
Know Your Adversary
Know Your Adversary
MyScale Blog
MyScale Blog
宝玉的分享
宝玉的分享
The Hacker News
The Hacker News
Project Zero
Project Zero
Application and Cybersecurity Blog
Application and Cybersecurity Blog
月光博客
月光博客
Recent Commits to openclaw:main
Recent Commits to openclaw:main
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
G
GRAHAM CLULEY
C
Cisco Blogs
I
Intezer
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
Recorded Future
Recorded Future
T
Tenable Blog
W
WeLiveSecurity
腾讯CDC
Stack Overflow Blog
Stack Overflow Blog
T
The Blog of Author Tim Ferriss
www.infosecurity-magazine.com
www.infosecurity-magazine.com
D
Docker
C
Cybersecurity and Infrastructure Security Agency CISA
PCI Perspectives
PCI Perspectives

DEV Community

I kept forgetting what subscriptions I was paying for, so I built something about it Human-in-the-Loop AI Workflow Automation with Make, FastAPI, OpenAI, and Monday CRM Meet phpvm: The PHP Version Manager for Linux (v2.5.1 Released) [Boost] The Hidden Cost of Context Switching How to Prepare for a Technical Data Engineer Interview I built a local MCP server that gives Claude Code real PR context — 33s reviews instead of 90s How I built AgentRAM: a memory API for AI agents without a vector DB AI, Pig Butchering, and the New Frontier of Scams: Why Scammers Are Becoming Developers Journey Begins: Google Cloud Get Certified Program Edition 2 (2026) I Vibe-Coded an App in a Weekend. Three Weeks Later I Couldn't Explain It. Feeding Raw HTML to Your LLM Is a Token Tax. I Measured It on 10 Real Pages — Median 7.4 , and It Hits Every Scheduled Run 22/30 Days System Design Questions Beyond Strict Mode: 5 Advanced TSConfig Settings for Bulletproof TypeScript The bug I kept seeing in math practice: right answers that were too slow gotracer: Turn Go Execution Traces into Actionable Findings Forget Python: Why PHP is the Real Future of AI for the Web Stop Reinventing the Wheel: 5 Hidden Gems in PrestaShop's Tools.php File AI Tools & Products Radar — May 28, 2026 New Benchmark Reveals Hidden Trade-offs in AI Model Tuning Methods What I Learned Building My First Chrome Extension for Google Calendar Trider – The AI Habit Tracker That Actually Gets You (Free, No Ads) 4 Best AI TTS APIs in 2026 Claude Opus 4.8: What Developers Need to Know About Anthropic's New Flagship Claude Opus 4.8: What Developers Need to Know About Anthropic's New Flagship Full Stack Developer Looking for Internship Opportunities How Microservices Talk to Each Other Using WebClient After burning through tens of billions of tokens, I built an Android-like OS that runs entirely in the browser The PrestaShop Modules "Jungle": An Unexpected Opportunity for Your Site? I Ship One AI Testing Feature Every Day — Here's What 6 Days Looks Like Only 2 of 128 YC-backed dev tools companies block unchecked merges Read environment variables from .env file in Angular PrestaShop Added an AI Onboarding System Directly to Its Repo The AI Control Plane Is Becoming the New Shadow IT How-To Spec-Driven AI Development Veltrix Events Were a Disaster Until We Fixed One Crucial Thing Phone-as-keyboard for any USB host — building a driverless HID bridge PrestaShop Development: Is Documentation Really the Problem? Python List Methods Explained Simply (Add, Remove, Sort) Impostor Syndrome in Tech - The Honest Version Nobody Posts About I Built a Tool to Stop Guessing LLM API Costs. Here Is What I Learned. Constraint Decay: Why Your AI Coding Agent Passes Tests But Breaks Production KairoDB-Human-Readable Databases Your best pull request could be a -500 (and that's seniority) I Built a Terminal Typing App Because I Was Tired of Leaving My Terminal Sending SMS from AWS Lambda Markdown to PDF: 8 methods compared (and why most of them disappoint) Coordinar deploys de frontend y backend sin orquestado, usando Github Actions I had to restore an entire database just to recover one deleted row The Sovereign Vault: Building High-Integrity AI with MCP & Local Vision I Built a Lightweight Python RAG Orchestrator That Works with SQLite, PGVector and Qdrant Redis — The Engine of Instant Gratification The Project I Couldn’t Finish 2 Years Ago - Notebook for ChatGPT Less Greedy Code, Less Misery: The Power of SRP Through a Battle-Tested Lens Which Cloud Is Best for Containers & Microservices? Why IBM Cloud Stands Out Modern css kills js 15 AI Coding Hacks Nobody Talks About (2026) Your AI Agents Need an Architecture, Not Just a Prompt AI coding assistants are making juniors worse and seniors lazier AI can generate HTML. Publishing it is still weirdly annoying. Shopify vs Magento for AI Commerce in 2026: Platform-Mediated vs Merchant-Controlled AEO I scanned Langfuse. It observes its own LLM calls through its own platform. Prompt caching in production: the 4 patterns that cut my Anthropic bill (and when not to bother) Why Does My Android Camera Stop Recording When the Screen Turns Off? Doze, WorkManager, and the Right Way to Build a Foreground Service We patched Chromium with 49 C++ hooks to beat Cloudflare — here's how BrowserHand works I Replaced 30 Minutes of Daily Browser Chores with One Cron Job Rename a Kubernetes PVC Without Losing Your Data: PersistentVolume Rebinding A Week in the Life of a Treasure Hunt Engine that Almost Went Off the Rails Architecture of Chaos Part 4 (Finale) — Split-Brain Surgery, Chaos Engineering, and Shipping to Production The Road to KiwiEngine — The Strange Feeling of Publishing Your Own Ecosystem Day 93: Bridging React to iOS Widgets and Face ID The Hidden Cost of Complex AI Platforms: Why Developer Experience Matters Running FreeIPA on Ubuntu Using Podman – Part 2: Step-by-Step Deployment In 2026, you can just prompt your way to a working Android app. 🤯 Why DDR5 Bandwidth Kills Dual-LLM Inference on APUs (Benchmarks Inside) OpenSparrow v2.6 – AI-powered search (RAG), bulk operations, and keyboard shortcuts The New Shape of Supply-Chain Trust Why Analytics Is Product Infrastructure The Fallacies of GenAI Development Stop Building AI Assistants. Build AI Firewalls. I built a "what is my IP" site because I was tired of the ugly ones How to Stop Your AI Agent Before It Does Something You Can't Undo I Just Wanted to Scrape One Page. Why Did I Write 50 Lines of Puppeteer? Amazon STAR Method 2026: The Complete Cheat Sheet (30+ Questions + Scored Examples) Building a Japanese-First Read-Later PWA: From Pocket Shutdown to Launch How to show weather on your personal website in 3 lines of JavaScript (no API key needed) Building user-customizable themes with Tailwind CSS I turned an abandoned Go project into a full terminal Arcade Game Part 2 of 4: Building a Real k6 Test Suite Against a Live Kubernetes App How I structured 12 Flutter paywall screens to share the same purchase logic I Added a Live Dashboard to My LLM Proxy. Zero Instrumentation. Just a URL Change. Free Security Audit API: Scan Your Code in 30 Seconds I Built an Uncensored AI Chatbot With a Mystical Sphinx Persona Agent memory poisoning. The 4-stage enterprise damage chain. 18 developer tools I use to improve my workflow I Found a Free Domain Platform Built by an 18-Year-Old — and It Actually Works Why smart contract deployment still needs better infrastructure Navigating Layoffs: A Comprehensive Guide for Professionals How to Track Website Visitors Without Cookies in 2026 Building a no-signup PDF toolkit with 32 small file tools
How to Integrate AI and LLMs into Production Web Apps (Lessons from the Field)
Ahad Nawaz · 2026-05-29 · via DEV Community

Everyone is adding AI to their product right now. Most of them are doing it wrong.

Not because they chose the wrong model. Not because they used the wrong library. But because they treated AI integration like a regular feature and skipped all the engineering discipline that production systems require.

I have integrated LLMs into multiple production applications. This is what I wish I had known before I started.

The Mental Model Shift You Need First

A traditional API call is deterministic. You send a request, you get a predictable response. You can write tests against it. You can cache it. You can reason about it.

An LLM call is not deterministic. The same input can produce different outputs on different runs. The model can refuse, hallucinate, or return output in a format you did not expect. Your system needs to be designed around this reality, not in spite of it.

This means defensive parsing, fallback logic, output validation, and graceful degradation are not optional extras. They are the core of the feature.

Choosing the Right Model for the Right Job

The biggest LLMs are not always the right choice. I learned this building EditDeck Pro, an AI creative platform for music.

Some tasks needed a large frontier model for nuanced creative output. Others needed a fast, cheap model that could run many times per session without accumulating significant latency or cost.

The pattern that works:

Use a lighter model for classification, extraction, and short structured outputs. Use a larger model for generation tasks where quality matters more than speed. Route dynamically between them based on the task type.

This can reduce your inference costs by 60 to 80 percent on workloads that mix simple and complex tasks.

Prompt Engineering Is Software Engineering

Prompts are code. They should be versioned, tested, and reviewed like code.

I store prompts in a dedicated module with version numbers. When I change a prompt I run it against a fixed evaluation set of inputs and compare the outputs to the previous version. If the quality drops on any test case, the change does not ship.

This sounds like overhead. It is not. Prompts drift over time as you iterate. Without a system to track changes, you will introduce regressions you cannot diagnose because you do not know what changed.

A practical prompt structure that works well across most tasks:

  1. Role and context definition
  2. Task description with explicit constraints
  3. Output format specification
  4. One or two examples if the task is complex

Keep prompts short and explicit. Long prompts with conflicting instructions produce inconsistent outputs.

Handling Async LLM Calls in Node.js

LLM calls are slow. A typical generation can take two to ten seconds. You cannot make users wait for a synchronous response on most interactions.

The architecture that works best for most production use cases:

When the user triggers an AI action, the API immediately returns a job ID and sets the status to processing. A background worker handles the actual LLM call. The frontend polls for status or receives an update over WebSocket when the job completes.

This keeps your API response times predictable, lets you retry failed jobs, and gives you visibility into queue depth and processing time.

For streaming responses where you want to show output in real time as the model generates it, use Server Sent Events. They are simpler than WebSockets for unidirectional streaming and well supported in Node.js with NestJS.

Output Validation Is Non Negotiable

If you ask an LLM to return JSON, it will sometimes return malformed JSON. If you ask it to follow a schema, it will occasionally miss a required field. If you ask it to stay within a character limit, it will sometimes exceed it.

Every LLM response in a production system should go through a validation layer before it reaches the user or gets stored in the database.

I use Zod for schema validation in TypeScript. The pattern looks like this: parse the model output, validate it against the expected schema, and if validation fails, either retry the call with the validation error included in the prompt or return a graceful fallback response to the user.

Never pass raw LLM output directly to your frontend or database without validation.

Rate Limiting and Cost Controls

LLM API costs can escalate quickly. A single user making many requests in a session can generate significant spend if you do not have controls in place.

In every AI feature I build I implement:

Per user rate limits at the API gateway level. Daily and monthly spend limits per workspace or account. Usage logging so you can analyze which features are generating the most cost. Automatic fallback to a cheaper model when the primary model is rate limited or slow.

Set hard cost limits in your provider dashboard as a safety net. You do not want to discover a runaway process or an abuse pattern through your invoice.

Caching LLM Responses Intelligently

Not all LLM calls need to go to the model on every request. For any task where the same input reliably produces equivalent output, caching can dramatically reduce both latency and cost.

Semantic caching is particularly useful here. Instead of exact match caching, you embed the input and cache the response against a vector. When a similar input comes in, you retrieve the cached response if the similarity is above a threshold.

This works well for FAQ style features, content suggestions based on category, and any task where slight variations in input should produce the same response.

What to Monitor in Production

Standard application monitoring is not enough for AI features. You need to track:

Latency per model and per task type. Token usage per request broken down by input and output. Validation failure rates, which indicate prompt quality issues. User level engagement with AI generated content, which tells you whether the outputs are actually useful.

A feature that generates outputs users never interact with is not a working feature regardless of whether the API calls succeed.

The Biggest Mistake Teams Make

Shipping an AI feature without a way to turn it off.

Model quality changes when providers update their models. API reliability has incidents. Your prompt may suddenly produce bad outputs for a class of inputs you did not anticipate.

Every AI feature should have a feature flag that lets you disable it instantly without a code deployment. The fallback should be a non AI version of the same functionality where possible.

This is not pessimism. It is the same defensive engineering you apply to any external dependency.

Where to Start

If you are adding your first LLM integration to a production application, start with a low stakes, read only feature. Summaries, suggestions, and search enhancements are good first candidates. They add value without being in the critical path, which gives you space to learn how the model behaves in your specific context before you build anything that writes data or makes decisions.

Get the infrastructure right first. Async handling, output validation, rate limiting, monitoring. Then expand.

AI is powerful software. It rewards the same engineering discipline that all powerful software requires.


I am Ahad, Founder of REIVEX Technologies. I build AI platforms and production web systems for clients across the US, Middle East, and South Asia. See more at ahadnawaz.dev.