惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

P
Proofpoint News Feed
T
The Exploit Database - CXSecurity.com
T
Threat Research - Cisco Blogs
S
Securelist
C
Cisco Blogs
C
CERT Recently Published Vulnerability Notes
P
Palo Alto Networks Blog
V
V2EX
Microsoft Security Blog
Microsoft Security Blog
T
Threatpost
Cyberwarzone
Cyberwarzone
Blog — PlanetScale
Blog — PlanetScale
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
T
Tenable Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
罗磊的独立博客
Project Zero
Project Zero
C
Cybersecurity and Infrastructure Security Agency CISA
C
Cyber Attacks, Cyber Crime and Cyber Security
博客园_首页
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
月光博客
月光博客
雷峰网
雷峰网
Recent Commits to openclaw:main
Recent Commits to openclaw:main
H
Heimdal Security Blog
PCI Perspectives
PCI Perspectives
G
Google Developers Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Cloudbric
Cloudbric
人人都是产品经理
人人都是产品经理
Latest news
Latest news
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Tor Project blog
Jina AI
Jina AI
S
Security Affairs
阮一峰的网络日志
阮一峰的网络日志
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Scott Helme
Scott Helme
M
MIT News - Artificial intelligence
The Hacker News
The Hacker News
F
Full Disclosure
宝玉的分享
宝玉的分享
Webroot Blog
Webroot Blog
Forbes - Security
Forbes - Security
SecWiki News
SecWiki News
O
OpenAI News
N
News | PayPal Newsroom
A
About on SuperTechFans
S
Security @ Cisco Blogs

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
I Built an AI Email Assistant From Scratch: What Nobody Tells You
gentlenode · 2026-06-18 · via DEV Community

Honestly, i Built an AI Email Assistant From Scratch: What Nobody Tells You

Last Tuesday I was staring at a $487 invoice from a client I love working with, and about 40% of that was my own API costs. That's not okay. Not when I'm billing out at $95/hour and watching my margin evaporate because I routed every email-classification request through GPT-4o like an absolute rookie.

Let me back up. Six months ago a small SaaS client asked me to build them an AI email assistant — something that could categorize inbound support emails, draft replies, and flag the urgent ones before a human ever saw them. Simple enough, right? Wrong. The real story is what I learned about the API market while doing it, and the math I should have run on day one.

If you're a freelance dev building AI tools for clients in 2026, this is the post I wish someone had written for me.

Why the Default Choice Burns Money

When most devs start an AI email project, they reach for OpenAI. I've done it. You did it. We all did it. The SDK is friendly, the docs are decent, and there's a kind of muscle memory at play.

But here's the thing nobody tells you: at production volume, GPT-4o at $2.50 per million input tokens and $10.00 per million output tokens is a luxury. Run the math with me.

Let's say your email assistant processes 50,000 emails a month. Average prompt is 800 tokens. Average completion is 200 tokens. Per email, that's roughly $0.002 input + $0.002 output = $0.004. Across 50,000 emails, you're at $200/month just on inference. And that's the cheap scenario. Once your clients start sending longer emails — and they always do — that 800-token prompt balloons to 1,500 and your output creeps up because the model wants to be helpful. Suddenly you're at $0.004 per email and a $200 line item becomes $375.

For a side project, fine. For a client engagement where I'm eating the cost during development and then trying to hand them a sustainable bill afterward? Painful.

So I went looking for alternatives, and I found something I wish I'd discovered months earlier.

The 184-Model Buffet Nobody Talks About

I stumbled onto Global API while doom-scrolling a dev forum at 1 AM. Their pitch was straightforward: 184 AI models, one unified SDK, one bill, prices ranging from $0.01 to $3.50 per million tokens depending on the model. The "one bill" part got my attention because I was juggling three different API providers for three different clients and losing my mind at invoice time.

I started pricing out my email assistant workload against their catalog. Here's the table I built, which I now have open in a tab at all times:

Model Input ($/M) Output ($/M) Context Window
DeepSeek V4 Flash 0.27 1.10 128K
DeepSeek V4 Pro 0.55 2.20 200K
Qwen3-32B 0.30 1.20 32K
GLM-4 Plus 0.20 0.80 128K
GPT-4o 2.50 10.00 128K

Let me do the math for you, because this is the part that pays my rent. Same workload — 50,000 emails, 800-token prompts, 200-token completions:

  • DeepSeek V4 Flash: $10.80 input + $11.00 output = $21.80/month
  • DeepSeek V4 Pro: $22.00 input + $22.00 output = $44.00/month
  • Qwen3-32B: $12.00 input + $12.00 output = $24.00/month
  • GLM-4 Plus: $8.00 input + $8.00 output = $16.00/month
  • GPT-4o: $100.00 input + $100.00 output = $200.00/month

That GLM-4 Plus line item made me put my coffee down. From $200/month down to $16/month for the exact same job. That's a 92% cost reduction on this single workload, and the quality difference for email classification is essentially noise.

For my client, the monthly bill dropped from $487 to $283, and I kept my margin the same. They got a better deal, I made the same per hour, and the only thing that changed was which model I pointed at the prompt.

The Code, Because I Know You Skimmed Past the Math

Here's the actual integration. It's almost embarrassingly simple because Global API speaks the OpenAI SDK protocol. You can swap providers without rewriting a single line of business logic.

import openai
import os

client = openai.OpenAI(
    base_url="https://global-apis.com/v1",
    api_key=os.environ["GLOBAL_API_KEY"],
)

def classify_email(subject: str, body: str) -> str:
    response = client.chat.completions.create(
        model="deepseek-ai/DeepSeek-V4-Flash",
        messages=[
            {
                "role": "system",
                "content": "You are an email classifier. Categorize the email as one of: billing, support, sales, urgent, or other. Reply with one word only."
            },
            {
                "role": "user",
                "content": f"Subject: {subject}\n\nBody: {body}"
            }
        ],
        max_tokens=10,
    )
    return response.choices[0].message.content.strip().lower()

That's my real classifier. DeepSeek V4 Flash at $0.27/M input and $1.10/M output handles about 90% of my email routing without ever needing to escalate. The "urgent" category triggers a follow-up call to GPT-4o via the same client to draft a reply, but only for the 5% of emails that actually need it.

Wait, let me show you the escalation path too, because tiered routing is where the real savings live:

def smart_email_handler(subject: str, body: str) -> dict:
    category = classify_email(subject, body)

    if category == "other":
        response = client.chat.completions.create(
            model="deepseek-ai/DeepSeek-V4-Pro",
            messages=[
                {"role": "system", "content": "Categorize this ambiguous email and suggest an action."},
                {"role": "user", "content": f"Subject: {subject}\n\nBody: {body}"}
            ],
        )
        return {
            "category": "escalated",
            "suggestion": response.choices[0].message.content
        }

    return {"category": category, "suggestion": None}

Two models, one client, one bill. The "expensive" model only fires when the cheap one punts. This is the architecture pattern I now use on every AI project, and the client bill has stayed sane.

The Five Things I Wish I'd Done Sooner

After running this in production for four months, here's the playbook I extracted. None of it is rocket science, but every line is something I learned the hard way:

1. Cache like your margin depends on it, because it does.
Inbound email traffic is brutally repetitive. "Where is my order?" arrives 200 times a day with slight variations. I hash the subject + first 100 chars of body and store the model response in Redis. My cache hit rate settled at 40% after about a week, and that single change cut my inference bill by another 35%. Billable hours saved on debugging cache invalidation: roughly zero. Billable hours saved on infrastructure costs: actual real money.

2. Stream the response, even when you don't need to.
Email drafting usually waits for the full response before showing the user anything. Streaming cuts perceived latency from 1.2 seconds to about 400ms for the first token, and users feel like the assistant is "fast" even when total generation time is identical. User satisfaction scores went up 18% just from this one change. I added it during a 30-minute billable increment and the client thinks I'm a wizard.

3. Route by complexity, not by default model.
I keep a small Python function that scores prompt complexity based on length, presence of structured data, and a few keywords. Simple stuff goes to GLM-4 Plus at $0.20/M input. Medium stuff goes to DeepSeek V4 Flash. Anything requiring nuanced reasoning hits DeepSeek V4 Pro. The "GA-Economy" tier on Global API gives roughly 50% cost reduction on simple queries compared to mid-tier models, and for email classification specifically, the quality delta is in the noise.

4. Track quality like an adult.
I built a tiny dashboard that samples 1% of responses and runs a second model as a judge. My average benchmark score across the email workloads sits at 84.6%, which is what Global API's own benchmarks showed for these models on similar tasks. When I see a dip, I know to investigate before the client notices. This is one billable hour per week that has saved me at least three client conversations about "the AI is being weird lately."

5. Build the fallback path on day one, not day 90.
Rate limits hit. They always do. I have a circuit breaker that swaps to a secondary model after three consecutive failures, and a queue that retries with exponential backoff. I have not had a single email-related outage since I built this. Before? Two in three months. The 1.2s average latency with 320 tokens/sec throughput is meaningless if your endpoint goes down during a Black Friday spike.

The Real Numbers After Four Months

Let me give you the honest before-and-after. Same client, same email volume (which has grown from 50K to about 78K emails per month as their business scales):

  • Before (GPT-4o everything): $312/month on inference
  • After (tiered routing through Global API): $89/month on inference
  • Savings: $223/month, which annualizes to $2,676

That $2,676/year is what I now use to negotiate a slightly higher hourly rate with my next client. The cost-savings story is the easiest upsell in the world when you can show real numbers from a previous engagement. "I can build this for you and here's my track record on similar projects" is infinitely more powerful than "I can build this for you, trust me."

Total time to integrate Global API into my existing client project: under 10 minutes. The SDK is a drop-in replacement for the OpenAI client, so I changed three lines of code and pointed everything at a new base URL. Most of my "integration time" was spent refactoring my routing logic to take advantage of multiple models, which I would have done eventually anyway.

What I'd Tell Past Me

If I could send a message back to the version of me that started this project six months ago, it would be this: stop treating the model choice as a one-time decision. The AI API