I Built an AI Email Assistant From Scratch: What Nobody Tells You

Honestly, i Built an AI Email Assistant From Scratch: What Nobody Tells You

Last Tuesday I was staring at a $487 invoice from a client I love working with, and about 40% of that was my own API costs. That's not okay. Not when I'm billing out at $95/hour and watching my margin evaporate because I routed every email-classification request through GPT-4o like an absolute rookie.

Let me back up. Six months ago a small SaaS client asked me to build them an AI email assistant — something that could categorize inbound support emails, draft replies, and flag the urgent ones before a human ever saw them. Simple enough, right? Wrong. The real story is what I learned about the API market while doing it, and the math I should have run on day one.

If you're a freelance dev building AI tools for clients in 2026, this is the post I wish someone had written for me.

Why the Default Choice Burns Money

When most devs start an AI email project, they reach for OpenAI. I've done it. You did it. We all did it. The SDK is friendly, the docs are decent, and there's a kind of muscle memory at play.

But here's the thing nobody tells you: at production volume, GPT-4o at $2.50 per million input tokens and $10.00 per million output tokens is a luxury. Run the math with me.

Let's say your email assistant processes 50,000 emails a month. Average prompt is 800 tokens. Average completion is 200 tokens. Per email, that's roughly $0.002 input + $0.002 output = $0.004. Across 50,000 emails, you're at $200/month just on inference. And that's the cheap scenario. Once your clients start sending longer emails — and they always do — that 800-token prompt balloons to 1,500 and your output creeps up because the model wants to be helpful. Suddenly you're at $0.004 per email and a $200 line item becomes $375.

For a side project, fine. For a client engagement where I'm eating the cost during development and then trying to hand them a sustainable bill afterward? Painful.

So I went looking for alternatives, and I found something I wish I'd discovered months earlier.

The 184-Model Buffet Nobody Talks About

I stumbled onto Global API while doom-scrolling a dev forum at 1 AM. Their pitch was straightforward: 184 AI models, one unified SDK, one bill, prices ranging from $0.01 to $3.50 per million tokens depending on the model. The "one bill" part got my attention because I was juggling three different API providers for three different clients and losing my mind at invoice time.

I started pricing out my email assistant workload against their catalog. Here's the table I built, which I now have open in a tab at all times:

Model	Input ($/M)	Output ($/M)	Context Window
DeepSeek V4 Flash	0.27	1.10	128K
DeepSeek V4 Pro	0.55	2.20	200K
Qwen3-32B	0.30	1.20	32K
GLM-4 Plus	0.20	0.80	128K
GPT-4o	2.50	10.00	128K

Let me do the math for you, because this is the part that pays my rent. Same workload — 50,000 emails, 800-token prompts, 200-token completions:

DeepSeek V4 Flash: $10.80 input + $11.00 output = $21.80/month
DeepSeek V4 Pro: $22.00 input + $22.00 output = $44.00/month
Qwen3-32B: $12.00 input + $12.00 output = $24.00/month
GLM-4 Plus: $8.00 input + $8.00 output = $16.00/month
GPT-4o: $100.00 input + $100.00 output = $200.00/month

That GLM-4 Plus line item made me put my coffee down. From $200/month down to $16/month for the exact same job. That's a 92% cost reduction on this single workload, and the quality difference for email classification is essentially noise.

For my client, the monthly bill dropped from $487 to $283, and I kept my margin the same. They got a better deal, I made the same per hour, and the only thing that changed was which model I pointed at the prompt.

The Code, Because I Know You Skimmed Past the Math

Here's the actual integration. It's almost embarrassingly simple because Global API speaks the OpenAI SDK protocol. You can swap providers without rewriting a single line of business logic.

import openai
import os

client = openai.OpenAI(
    base_url="https://global-apis.com/v1",
    api_key=os.environ["GLOBAL_API_KEY"],
)

def classify_email(subject: str, body: str) -> str:
    response = client.chat.completions.create(
        model="deepseek-ai/DeepSeek-V4-Flash",
        messages=[
            {
                "role": "system",
                "content": "You are an email classifier. Categorize the email as one of: billing, support, sales, urgent, or other. Reply with one word only."
            },
            {
                "role": "user",
                "content": f"Subject: {subject}\n\nBody: {body}"
            }
        ],
        max_tokens=10,
    )
    return response.choices[0].message.content.strip().lower()

That's my real classifier. DeepSeek V4 Flash at $0.27/M input and $1.10/M output handles about 90% of my email routing without ever needing to escalate. The "urgent" category triggers a follow-up call to GPT-4o via the same client to draft a reply, but only for the 5% of emails that actually need it.

Wait, let me show you the escalation path too, because tiered routing is where the real savings live:

def smart_email_handler(subject: str, body: str) -> dict:
    category = classify_email(subject, body)

    if category == "other":
        response = client.chat.completions.create(
            model="deepseek-ai/DeepSeek-V4-Pro",
            messages=[
                {"role": "system", "content": "Categorize this ambiguous email and suggest an action."},
                {"role": "user", "content": f"Subject: {subject}\n\nBody: {body}"}
            ],
        )
        return {
            "category": "escalated",
            "suggestion": response.choices[0].message.content
        }

    return {"category": category, "suggestion": None}

Two models, one client, one bill. The "expensive" model only fires when the cheap one punts. This is the architecture pattern I now use on every AI project, and the client bill has stayed sane.

The Five Things I Wish I'd Done Sooner

After running this in production for four months, here's the playbook I extracted. None of it is rocket science, but every line is something I learned the hard way:

1. Cache like your margin depends on it, because it does.
Inbound email traffic is brutally repetitive. "Where is my order?" arrives 200 times a day with slight variations. I hash the subject + first 100 chars of body and store the model response in Redis. My cache hit rate settled at 40% after about a week, and that single change cut my inference bill by another 35%. Billable hours saved on debugging cache invalidation: roughly zero. Billable hours saved on infrastructure costs: actual real money.

2. Stream the response, even when you don't need to.
Email drafting usually waits for the full response before showing the user anything. Streaming cuts perceived latency from 1.2 seconds to about 400ms for the first token, and users feel like the assistant is "fast" even when total generation time is identical. User satisfaction scores went up 18% just from this one change. I added it during a 30-minute billable increment and the client thinks I'm a wizard.

3. Route by complexity, not by default model.
I keep a small Python function that scores prompt complexity based on length, presence of structured data, and a few keywords. Simple stuff goes to GLM-4 Plus at $0.20/M input. Medium stuff goes to DeepSeek V4 Flash. Anything requiring nuanced reasoning hits DeepSeek V4 Pro. The "GA-Economy" tier on Global API gives roughly 50% cost reduction on simple queries compared to mid-tier models, and for email classification specifically, the quality delta is in the noise.

4. Track quality like an adult.
I built a tiny dashboard that samples 1% of responses and runs a second model as a judge. My average benchmark score across the email workloads sits at 84.6%, which is what Global API's own benchmarks showed for these models on similar tasks. When I see a dip, I know to investigate before the client notices. This is one billable hour per week that has saved me at least three client conversations about "the AI is being weird lately."

5. Build the fallback path on day one, not day 90.
Rate limits hit. They always do. I have a circuit breaker that swaps to a secondary model after three consecutive failures, and a queue that retries with exponential backoff. I have not had a single email-related outage since I built this. Before? Two in three months. The 1.2s average latency with 320 tokens/sec throughput is meaningless if your endpoint goes down during a Black Friday spike.

The Real Numbers After Four Months

Let me give you the honest before-and-after. Same client, same email volume (which has grown from 50K to about 78K emails per month as their business scales):

Before (GPT-4o everything): $312/month on inference
After (tiered routing through Global API): $89/month on inference
Savings: $223/month, which annualizes to $2,676

That $2,676/year is what I now use to negotiate a slightly higher hourly rate with my next client. The cost-savings story is the easiest upsell in the world when you can show real numbers from a previous engagement. "I can build this for you and here's my track record on similar projects" is infinitely more powerful than "I can build this for you, trust me."

Total time to integrate Global API into my existing client project: under 10 minutes. The SDK is a drop-in replacement for the OpenAI client, so I changed three lines of code and pointed everything at a new base URL. Most of my "integration time" was spent refactoring my routing logic to take advantage of multiple models, which I would have done eventually anyway.

What I'd Tell Past Me

If I could send a message back to the version of me that started this project six months ago, it would be this: stop treating the model choice as a one-time decision. The AI API

推荐订阅源

DEV Community