惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

From Zero and Confused, This Is How I Started Learning to Code Bootstrapping with AI: Why Gemma 4 is the Micro-SaaS Founder’s Best Friend How Does an AI Agent Actually Buy Something? Google Just Published the Spec. Google I/O 2026 Is One Uncanny F.R.I.E.N.D.S Group Upgrade The "MTTR Is All You Need" Trap The Quiet Revolution: How Firebase Became the First Agent-Native Backend at Google I/O 2026 I Built ResuMate! A 100% Private, Local AI Resume Optimizer with Google Gemma 4 Learning DirectX 12 - Part 2 Initialization Theory NeuralHats: I Put Edward de Bono’s Six Thinking Hats on Local LLMs Using Gemma 4 📝 Instant Auto Save Notes Engineering the "App-Like" Experience: A Deep Dive into PWA Architecture I built a local first AI CCTV assistant using Gemma 4 + Frigate CrowdShield AI — Smart Stadium Operating System & Crowd Intelligence Platform I built a free AI observability tool, prove your AI is useful, not just running Beyond Autocomplete: Why Google Antigravity 2.0 Changes the Rules for Indie Builders 터미널 AI 에이전트 구축 (v12) Building Instagram-Powered Apps with HikerAPI (Without Fighting Scrapers) Checkpoints, Not Transcripts: Rethinking AI Coding Agent Memory From Side Project to Student Savior: My AI PPT & Resume Tool Crossed 1.5K+ Users Why Story Points Don’t Work in the AI Era, And What Should Take Their Place Instead. Self-Hosted Document AI: How to Run Document Intelligence On Your Own Infrastructure (2026) How to Extract Tables from PDFs with AI: 4 Methods That Actually Work (2026) IDP vs OCR: What's the Difference — and Which Does Your Business Actually Need? Automated PII Detection and Redaction in Business Documents: A Practical Guide Human-in-the-Loop Document Review: When to Use It and How to Set It Up (2026) Document Processing Without RPA: A Modern Approach for Small Teams Reducto Alternative: When You Need More Than a Document Parser (2026) Hermes Agent vs LangChain vs CrewAI: When to Reach for Each SparshAI: I Built an Offline AI Tutor for Students Using Gemma 4 — Here's What Happened Building NeuroSense AI: A Human-Centered Stress Insight Assistant Powered by Gemma Why I Built a Privacy-First Dev Toolkit GAS Input Tags: Ability Activation Without Hardcoded Bindings AI Legal Document Advisor Supported By Gemm 4 Model Building Convertify in Public Week 10: PDF Cluster + Blog Launch CureNet AI: Decentralized Health Intelligence for India, Powered by Gemma 4 and ABHA Standardization When Open-Weights AI Meets a Broken Healthcare System: Deploying Gemma 4 in Rural India V.A.L.I.D. Google I/O 2026: The Year Google Stopped Building AI Assistants and Started Shipping AI Engineers Bondmap: AI-Powered Relationship Network That Maps How You're Connected to Everyone Using Gemma 4 Gemma 4 challenge inspired me to build my first app! 96. LoRA: Fine-Tune a Billion-Parameter Model on a Laptop From a Student Who Used CircuitVerse to a GSoC Contributor — My Community Bonding Story How Bf-Tree Keeps Mini-Pages Small, Hot, and Cheap to Evict I asked Claude to explain the chip war and ended up understanding modern geopolitics differently Stop Manually Checking for Server Updates: Automate With Email Notifications Nostalgia Meets Cybersecurity: Spotting Modern Scams in a Retro OS Simulator - Forward or Fraud CRACKING CODING INTERVIEW From Python to Production Pipeline :A Practical guide to Apache Airflow Antigravity 2.0: Google Just Changed What It Means to Be an Engineer I Built a Free Sticker Maker Because Every Other One Hid the Export How I bypassed Blazor WebAssembly's Virtual DOM using raw WASM pointers Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable The Zero-Budget Memory Setup Behind My AI Agent Workflow No database. No framework. Just files, startup order, correction logs, and discipline. I Built an AI Second Brain with Gemma 4 The Most Exciting Google I/O 2026 Announcement for Me: HTML-in-Canvas CrisisLens: Compressing Disaster Scenes into 200-Byte Emergency Payloads with Gemma 4 I'm 15 and I built a todo app with Telegram Stars payments — only legal way for me to monetize before turning 18 Crypto Branding After the Token Launch Building an on-chain alerts bot in Python without any blockchain library FinePrint — An AI Pocket Lawyer That Decodes Predatory Contracts Using Gemma 4 How to Connect OpenAI with Supabase in 10 Minutes for a Lightning-Fast AI MVP One AI Gateway for AWS Bedrock, Google Vertex AI, Gemini, and Anthropic Reading Log #9 — Aoashi The Tacit Dimension Thinking, Fast and Slow Web3 Onboarding Is Not a Wallet Problem. It Is a Trust Problem. FHE Prompt Privacy: The Metadata Leak Your Demo Still Has Software Might Be Becoming Agent-Aware: What if software starts coordinating itself? The Silent Killers of Go Concurrency: Mutexes, Semaphores, and Goroutine Leaks Lynx framework first look Building Aries AI: A Solo-Built AI Abacus Tutor on OpenAI + Supabase + Render + Razorpay I built a paid Telegram bot. Here's what Telegram Stars actually pay. Transfer Fees, Metadata, and Soulbound Tokens: A Tour of Solana Token Extensions Improving AI resume matching with prompt iteration — 7.37 to 8.37/10 7 things you can do with Rogue Studio that no other AI IDE will let you do Why I Think WordPress Still Matters Reading Log #7 — Aoashi Guns, Germs, and Steel Distinction Open Models and the Sub-Saharan Region What 12 Months of AI-Generated Pull Requests Taught My Engineering Team Feature Flags in .NET 8: ASP.NET Core, Minimal APIs, Blazor The Quiet Architecture of Systems That Refuse to Die From OOP to SOLID: Everything You Need to Know in One Article I Scanned 5 Common LangChain Agent Patterns. Every Single One Was Over-Permissioned. Production-Ready MCP Servers in 60 Seconds (Auth, Rate Limits, Audit Logs Included) Dari OOP ke SOLID: Semua yang Perlu Kamu Tahu dalam Satu Artikel The Most Important Part of Google I/O 2026 Wasn’t a Model — It Was the Infrastructure When SafetyCo Goes to War: Anthropic, the DOD, and the Limits of Ideals-Based Frameworks Why AI Memory Resolves Too Much — And What to Preserve Instead What Gemma 4 Means for the Future of Local AI (And Why It Matters More Than GPT-5) The Classroom Gap: Why Applied AI Has Yet to Transform How the World Learns Cell-to-Sentence (C2S): LLM-Powered scRNA-seq Annotation with Gemma 4 GitHub rust-2026-template — my Rust starter in 2026 Stop Editing JSON by Hand How I Turned an Old Movie Recommendation Project Into a Cinematic AI Platform Linux Command Line: The 25 Commands I Use Every Day (2026) The Multilingual SEO Trap: When Your Meta Description Speaks the Wrong Language young-colleague-job-worries What I Learned About Token Design on Solana as a Web2 Developer 19/30 Days System Design Questions! My first Android App - NightLock Tabula vs Camelot vs pdfplumber in 2026: Which Python Library Actually Wins? AI Agent Failure Loops: When Persistence Becomes a Quality Bug
I Built a Local AI Gateway That Talks to Claude, ChatGPT, DeepSeek and Gemini — Without a Single API Key
Muhammad Ali · 2026-05-25 · via DEV Community

Muhammad Ali

I Built a Local AI Gateway That Talks to Claude, ChatGPT, DeepSeek and Gemini — Without a Single API Key

Every developer building with AI hits the same wall eventually.

You're prototyping something. It's working. Then the bill arrives — or worse, the rate limit. You stare at 429 RESOURCE_EXHAUSTED and think: there has to be another way.

There is. And it's sitting right on your desktop.


The Insight Nobody Talks About

Every major AI company gives you free access through their UI. Claude has a desktop app. ChatGPT has a desktop app. DeepSeek and Gemini run in your browser. You log in, you type, you get a reply. Completely free.

So I asked myself: why am I paying for API access when the same model is available for free one layer above?

The answer: because there's no programmatic way to use it.

So I built one.


What AI Gateway Does

AI Gateway is a local Flask server that sits between your application and the AI desktop apps on your machine. You send it an HTTP request. It controls the desktop app using OS-level automation, types your query, waits for the reply, extracts it, and returns it as JSON.

Your App / Terminal / Browser
        ↓
POST http://localhost:5000/ask
        ↓
AI Gateway Server (Flask + Queue)
        ↓
Auto-detects OS → routes to correct handler
        ↓
Controls AI Desktop App (Claude / ChatGPT / DeepSeek / Gemini)
        ↓
Returns reply as JSON

Enter fullscreen mode Exit fullscreen mode

No API key. No billing. No rate limits per token. Just your existing free account doing what it already does — except now your code can talk to it.


How to Use It

Setup (5 minutes)

git clone https://github.com/malikasana/ai-gateway
cd ai-gateway
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
copy .env.example .env
python server.py

Enter fullscreen mode Exit fullscreen mode

Server starts at http://localhost:5000.

Make sure your AI apps are open and logged in before starting.

Send a query from Python

import requests

response = requests.post("http://localhost:5000/ask", json={
    "query": "Explain recursion in one paragraph",
    "ai": "claude",
    "mode": "incognito"
})

print(response.json()["reply"])

Enter fullscreen mode Exit fullscreen mode

Works with claude, chatgpt, deepseek, and gemini. Switch the ai field and you're talking to a different model.

Response format

{
  "status": "ok",
  "ai": "claude",
  "mode": "incognito",
  "query": "Explain recursion in one paragraph",
  "reply": "Recursion is...",
  "chars": 240
}

Enter fullscreen mode Exit fullscreen mode

Browser UI

Open http://localhost:5000 in your browser. There's a built-in UI — select your AI, type your query, hit Send. Works on mobile too if you expose it via ngrok.

Public access via ngrok

ngrok http 5000

Enter fullscreen mode Exit fullscreen mode

Now you can hit your local gateway from your phone, a remote server, anywhere.


The Architecture

The project is small but deliberately structured:

ai-gateway/
├── server.py              # Flask server, /ask and /health endpoints
├── queue_manager.py       # One request at a time, OS detection, routing
├── templates/
│   └── index.html         # Browser UI
└── instances/
    ├── claude/windows/incognito.py
    ├── chatgpt/windows/incognito.py
    ├── deepseek/windows/incognito.py
    └── gemini/windows/incognito.py

Enter fullscreen mode Exit fullscreen mode

Each AI has its own handler. The queue manager ensures requests are processed one at a time — because you can't have two things typing into Claude simultaneously. OS detection routes to the right handler automatically so the same API call works regardless of platform (Mac support coming).


What I Learned Building This

Desktop automation is fragile but powerful. Every AI app has its own quirks. DeepSeek needed a Copy button workaround for reliable reply extraction. Gemini's Chrome automation behaves differently from the desktop apps. Each handler required its own approach.

Queue management matters more than you think. Early versions had race conditions where two simultaneous requests would collide mid-automation. The queue enforces serial execution cleanly.

The free tier is genuinely generous. During development and testing I sent hundreds of queries across all four models. Zero cost. The free tiers from these companies are substantial if you use them through the UI rather than the API.


Honest Limitations

This isn't a production API replacement. Be clear-eyed about what it is:

  • One request at a time — queue-based, not concurrent
  • Requires desktop apps open — it's automation, not an API call
  • Windows only right now — Mac support is in progress
  • No conversation memory yet — each query is stateless (stateful mode coming)
  • Fragile to UI changes — if Claude updates their desktop app layout, the handler may break

If you need high-throughput production AI calls, use the official APIs. This is for developers who want to prototype, experiment, build side projects, or simply can't afford API costs right now.


Current Status and Roadmap

✅ Claude — Windows incognito mode

✅ ChatGPT — Windows incognito mode

✅ DeepSeek — Windows incognito mode

✅ Gemini — Windows incognito mode

⬜ Mac support for all AIs

⬜ Stateful mode (persistent conversations)

⬜ Browser UI improvements


Get It

GitHub: github.com/malikasana/ai-gateway