惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
Scott Helme
Scott Helme
爱范儿
爱范儿
WordPress大学
WordPress大学
博客园 - 三生石上(FineUI控件)
阮一峰的网络日志
阮一峰的网络日志
博客园 - Franky
V
V2EX
腾讯CDC
博客园_首页
博客园 - 司徒正美
酷 壳 – CoolShell
酷 壳 – CoolShell
T
Tailwind CSS Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
小众软件
小众软件
J
Java Code Geeks
大猫的无限游戏
大猫的无限游戏
月光博客
月光博客
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog
雷峰网
雷峰网
Stack Overflow Blog
Stack Overflow Blog
IT之家
IT之家
罗磊的独立博客
Recorded Future
Recorded Future
博客园 - 聂微东
O
OpenAI News
S
Secure Thoughts
Hacker News: Ask HN
Hacker News: Ask HN
S
Schneier on Security
Hacker News - Newest:
Hacker News - Newest: "LLM"
Y
Y Combinator Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
Project Zero
Project Zero
宝玉的分享
宝玉的分享
K
Kaspersky official blog
N
Netflix TechBlog - Medium
T
The Exploit Database - CXSecurity.com
Google Online Security Blog
Google Online Security Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Webroot Blog
Webroot Blog
云风的 BLOG
云风的 BLOG
Simon Willison's Weblog
Simon Willison's Weblog
C
Check Point Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
L
LINUX DO - 热门话题
美团技术团队
L
Lohrmann on Cybersecurity

Show HN

CSP Radar GitHub - awebai/aweb-team-coord-worktrees: An aweb team template for a minimum team with a permanent coordinator and worktrees with local developers. GitHub - fujibee/agmsg GitHub - lucastononro/notify: 100% local, free, offline attention skill for Claude Code: plays a sound and speaks a short status update when a long task finishes, blocks, or needs a decision. GitHub - sebastianwessel/skills: AI Skills tivatdoar / workout-to-work · GitLab Release v1.0.0-alpha7 · pantoniou/libfyaml GitHub - enumura1/py-sql-cleaner: Find, format, and safely extract embedded SQL from Python files. GitHub - intent-bench/intent-bench: Intent fulfillment benchmark for agentic AI engineering GitHub - steveking-gh/firmion: Firmion is DSL and engine for firmware image generation. GitHub - villagesql/villagesql-skills: Agent skills for VillageSQL - gemini-cli-extension; claude-code-plugin GitHub - 0gsd/enough: a personal language system for planning, writing, and translation. GitHub - Kaelio/ktx: ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills, memory and a semantic layer GitHub - ThatXliner/xtras: Xliner's Claude Code Skills GitHub - flightdeckhq/flightdeck: Observability and control plane for AI agents. GitHub - search-router/simple-search: Open-source reference app on top of the Search Router API: FastAPI + Jinja metasearch service with pluggable backends, deterministic mocks (no API key needed), RTL UI, Redis cache, and a demo ads cabinet. CSP Radar GitHub - Light-Heart-Labs/DreamServer: Turn your PC, Mac, or Linux box into an AI server. LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. GitHub - Diplomat-ai/diplomat-agent-ts: What can your TypeScript AI agent do to the real world? Scan your code. See which tool calls have zero checks Code Block Selector - Visual Studio Marketplace Prometheus dependency graph — interactive showcase | Riftmap Show HN: I made a vi-like modal keyboard plugin for Figma GitHub - run-llama/liteparse: A fast, helpful, and open-source document parser GitHub - dalemyers/Roar: A macOS CLI tool for notifications GitHub - district-solutions/open-agent-tools-coder: Enables small-to-large self-hosted ai models to use local source code when running tool-calling agentic workloads. We actively data mine 20,900+ (2+ TB) popular github repos using large and small ai models to create reuseable: json, markdown and parquet files for local-first tool-calling models. GitHub - progapandist/stripeek: A local TUI proxy for real-time Stripe API debugging, built for navigating complex payloads fast. GitHub - sir1st/hermes-desktop: All-in-one cross-platform desktop app for Hermes Agent — bundles Python + hermes-agent + hermes-web-ui GitHub - astefanutti/shaderbang: Shebang for Shaders Show HN: Generate Claude Code Workflows using Spec Driven Development approach GitHub - nixys/nxs-universal-chart: The Helm chart you can use to install any of your applications into Kubernetes/OpenShift Show HN: AI agents for UK GDAD PCF roles and their skills The Two Pillars: Mixer Mode and Meta-Software in the Reorganization of Software Work After AI GitHub - JaiCode08/teleport-env What 1,000+ Harness Experiments Taught Me About Self-Improving Agents Show HN: Liiists, a Markdown-first, iOS and CLI list app SwiperTab – Get this Extension for 🦊 Firefox (en-US) GitHub - kouhxp/fftext: Summarize, explain, fact-check, or translate any text, URL, or file. No GPU. No cloud. One command GitHub - sweetpad-dev/sweetpad: Develop Swift/iOS projects using VSCode GitHub - dogmaticdev/IRON: IRON a.k.a. Intermediate Representation Object Notation is a Interpreter/Database that is used to create Programming Languages. GitHub - sjhalani7/vaen: Package your AI coding harness into a portable .agent file, and share it across repos, teams, & the community without ever having to copy-paste instructions, skills, MCP config, or secrets. Show HN: Gandalf the Grader Show HN: Citadeld – replay any CI failure locally from a single file GitHub - tdortman/cuSBF: High-Performance GPU Super Bloom Filter coral-ai/claude-code-token-xray at main · Coral-Bricks-AI/coral-ai GitHub - ulyssestenn/funes: Funes is a Git-based framework for LLM-managed knowledge work: an AI Librarian ingests raw sources, builds an interlinked Markdown knowledge base, and uses it to produce cited reports, analyses, and other outputs. GitHub - ThatXliner/gah: Git Add Hunk, built for agents to use GitHub - harmont-dev/harmont-cli: Command-line client for the Harmont CI platform GitHub - brooksmcmillin/mcp-authflow: OAuth 2.0 Authorization Server framework for MCP servers GitHub - javaid-codes/audit-supply-chain-agents GitHub - amorey/gochan: A small library of common channel architectures for Go, inspired by Rust GitHub - arifozgun/OpenGem: Free, Open-Source AI API Gateway with Gemini, OpenAI & Anthropic Compatibility in 1 file GitHub - Pranesh950/BioPetals: 🌸 Run BIOxAI models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading GitHub - cnguyen14/bounty-doctor: Diagnose a GitHub bounty issue before you waste hours: detects honeypot scam repos, AI-bot attempt swarms, and stale contests. Show HN: CoreMCP – MCP Server for On-Prem DBs Show HN: KittyHTML – Render HTML/CSS as an inline image in your terminal GitHub - bingud/filemat: Web-based file manager Show HN: TruthLens – Free multi-signal deepfake image detector GitHub - apexlocal-jz/claude-usage-tray: Windows system-tray app showing your Claude Code rate-limit usage at a glance. Zero deps, ~300 lines of PowerShell. Cross-IDE (works regardless of VS Code, Cursor, plain terminal). Release v0.1.2.1 · kouhxp/yapsnap GitHub - noopolis/moltnet: Self-hostable chat network for AI agents. Pre-built bridges for Claude Code, Codex, and the Claws. Rooms, DMs, history. No Slack bots, no Matrix, no glue code. GitHub - tamerh/enju: Coordinating Humans, AI Agents, and Compute as Peers on a Shared Workflow Graph Show HN: Continuity-auth – Respect-weighted rate limits for the open web GitHub - luml-ai/luml: AI lifecycle platform where engineers and agents track experiments, train models, and ship to production. GitHub - mrdanielcasper/CoreTex: A UNIX-inspired, biomimetic, flat-file AI harness and knowledge engine. GitHub - clemg/pierre-github: Pierre's diffs.com and trees.software for Github GitHub - lyriks-io/unspaghettit: Behavior-driven AI development without prompt spaghetti. GitHub - sofumel/claude-handoff-revive: Resume Claude Code work after rate/usage/context limits without replaying the prior transcript. Auto-saves at 90%/95% usage. Plugin-installable, 10 languages. GitHub - dotexorg/saferpc: Typed, end-to-end encrypted RPC over any bidirectional channel. GitHub - BeeZeeAgent/beezee: Agent harness orchestration Legato Next.js Boilerplate for Internal Tools · CoreUI GitHub - clark-labs-inc/clark-hash: Clark Hash, 32x smaller searchable sketches for embeddings GitHub - ZeroPointRepo/youtube-mcp: The fastest YouTube transcript + YouTube search MCP for AI agents. Try for free. Typing Mastery — climb toward 100+ WPM, deliberately GitHub - Andebugulin/Awareen GitHub - fayzan123/claude-workflow-composer: Visual desktop app for composing multi-agent coding workflows. Drag agents, attach skills and MCPs, wire handoffs, export to .claude/ GitHub - StackOneHQ/stack-nudge We hardened an LLM agent. Each defense we added made it more exploitable. GitHub - alkait/WhatsKept: Agent-queryable WhatsApp history from an iOS backup — a single Go binary. GitHub - octelium/cordium: Open-source, general-purpose sandbox platform for devs and AI agents that provides identity-based secure access to infrastructure without credentials. GitHub - scosman/videowright: Build animated explainer videos with your coding agent GitHub - dipankar/dscode: The code editor you can take apart. GitHub - zoharbabin/web-researcher-mcp: MCP server (Go) for AI assistants: web search, content extraction, academic/patent/news research. Multi-provider routing, 4-tier scraping, search lenses. Works with Claude, Cursor, and any MCP client. GitHub - scanaislop/aislop: Catch the slop AI coding agents leave in your code: narrative comments, swallowed exceptions, as-any casts, dead code, oversized functions. 50+ rules across 7 languages (TypeScript, JavaScript, Python, Go, Rust, Ruby, PHP). Sub-second, deterministic, no LLM at runtime. MIT-licensed. GitHub - kouhxp/cheap-im: CPU-only voice agent approximating Thinking Machines' Interaction Models demo GitHub - unprovable/OrchidMantis: Orchid Mantis — standalone framework for Zero-Knowledge Proofs of eXploit (ZKPoX). GitHub - CarpseDeam/Aura-IDE: An AI coding harness that shaped itself - Planner/Worker agents, repo awareness, surgical edits, validation, recovery, and safe diff approvals. GitHub - chojs23/concord: A feature-rich TUI client for Discord GitHub - aerf-spec/aerf: Agent Evidence Receipt Format (AERF) — an open specification for tamper-evident, independently verifiable records of AI agent actions. GitHub - Jwrede/tokentoll: Catch LLM cost changes in code review. Infracost for LLM spend. GitHub - samchon/ttsc: A `typescript-go` toolchain for compiler-powered plugins and type-safe execution + 500x faster lint integrated into compiler GitHub - Higangssh/homebutler: 🏠 Manage your homelab from chat. Single binary, zero dependencies. GitHub - olalie/tapmap: See where your computer connects and what stands out on a live world map. GitHub - Diplomat-ai/diplomat-agent: What can your AI agent do to the real world? Scan your code. See which tool calls have zero checks GitHub - Bajusz15/beacon: Open-source agent for secure remote access, monitoring, and deploys across home-lab and self-hosted machines like Raspberry Pi, N100, or any Linux server. Open web based TTY or tunnel Home Assistant and other local services securely without opening ports. BigTech AI News - Chrome 应用商店 GitHub - vinhnx/VTCode: VT Code is an open-source coding agent with LLM-native code understanding and robust shell safety. Supports multiple LLM providers with automatic failover and efficient context management. GitHub - Lumen-Labs/brainapi2: BrainAPI is a knowledge graph–powered AI memory layer that transforms unstructured data into structured knowledge, enabling intelligent search, recommendations, and contextual memory for AI agents and applications. GitHub - familiar-software/familiar: Let AI watch you work. Familiar lets your AI update its memory, skills, and knowledge by watching your screen. make sidebar/address bar rounded corner toggleable
GitHub - Arnab758/ai-gateway
arnab777 · 2026-06-23 · via Show HN

Cut your LLM API costs by 40-70% with zero code changes.

A semantic caching layer that sits between your app and AI providers (OpenAI, Groq, etc.). When you ask a similar question twice, it returns the cached answer instantly instead of calling the API again.

🎯 What Problem Does This Solve?

You're building an AI app and your API bill is $500/month. 40-70% of that is for repeat questions:

  • "What is RAG?" asked 100 times = 100 API calls
  • "How do I reset my password?" asked 50 times = 50 API calls

With AI Gateway: Those 150 calls become 2 calls (one for each unique question). You save $200-350/month.

🚀 Deploy in 60 Seconds (3 Options)

Option 1: Railway (Recommended - Includes Redis)

Deploy to Railway

Steps:

  1. Click the button above
  2. Sign in with GitHub
  3. Enter your API key (Groq or OpenAI)
  4. Click "Deploy"
  5. Done! Your gateway is live at https://your-app.up.railway.app

What you get:

  • ✅ Hosted gateway (no server management)
  • ✅ Redis included (persistent cache)
  • ✅ Auto-scaling
  • ✅ HTTPS enabled
  • ✅ $5/month free credit

Option 2: Render (One-Click Deploy)

Deploy to Render

Steps:

  1. Click the button
  2. Sign in with GitHub
  3. Add environment variable: UPSTREAM_API_KEY=your_key
  4. Click "Create Web Service"
  5. Done!

Note: You'll need to add a Redis addon separately in Render dashboard.

Option 3: Docker (Self-Hosted)

Prerequisites:

  • Docker installed
  • Docker Compose installed
  • A Groq or OpenAI API key

Steps:

# 1. Clone the repo
git clone https://github.com/Arnab758/ai-gateway.git
cd ai-gateway

# 2. Set your API key
export UPSTREAM_API_KEY=gsk_your_groq_key_here

# 3. Start everything (gateway + Redis)
docker compose up -d

# 4. Verify it's running
curl http://localhost:8080/health

# Expected response: {"status":"ok"}

That's it! Your gateway is now running at http://localhost:8080

📖 How to Use

Basic Usage (cURL)

# Send a request through the gateway
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Gateway-Token: my-app" \
  -H "Authorization: Bearer sk-your-openai-or-groq-key" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "What is RAG?"}]
  }'

# Send the SAME request again
# Response headers will show: X-Gateway-Cache: HIT
# You just saved money! 💰

Python Example

import requests

# Your gateway URL (from Railway/Render/Docker)
GATEWAY_URL = "https://your-app.up.railway.app"
API_KEY = "sk-your-key"

response = requests.post(
    f"{GATEWAY_URL}/v1/chat/completions",
    headers={
        "Content-Type": "application/json",
        "X-Gateway-Token": "my-app",
        "Authorization": f"Bearer {API_KEY}"
    },
    json={
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "What is RAG?"}]
    }
)

print(response.json())

Node.js Example

const response = await fetch('https://your-app.up.railway.app/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-Gateway-Token': 'my-app',
    'Authorization': 'Bearer sk-your-key'
  },
  body: JSON.stringify({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'What is RAG?' }]
  })
});

const data = await response.json();
console.log(data);

🎮 Try the Interactive Demo

No API key needed! See how caching works:

👉 Open Live Demo

  • Type a prompt and click "Send" (simulation mode)
  • Or enter your API key and click "Test Real API" (real caching with Redis)
  • Try sending the same prompt twice to see cache hits!

🔥 Key Features

  • Semantic Caching - Matches similar questions, not just exact duplicates
    • "What is RAG?" = "Explain RAG" = "RAG definition"
  • Multi-Tenant - Each customer gets their own isolated cache
  • 4-Tier Matching:
    1. Exact match (100% identical)
    2. Template match ("weather in London" = "weather in Paris")
    3. Semantic match (similar meaning)
    4. Word overlap (partial matches)
  • Redis + In-Memory Fallback - Works with or without Redis
  • Request Deduplication - 100 concurrent identical requests = 1 API call
  • Rate Limiting - Prevent abuse per tenant
  • Circuit Breaker - Automatically stops calling if provider is down
  • Cost Tracking - See how much you saved

📊 Real-World Example

Scenario: Customer support chatbot with 10,000 users

Without AI Gateway:

  • 10,000 users ask 100 common questions each
  • 1,000,000 API calls/month
  • Cost: $500/month (at $0.0005/call)

With AI Gateway:

  • First 100 questions: 100 API calls (cache miss)
  • Next 9,900 users asking same questions: 0 API calls (cache hit)
  • Total: 100 API calls/month
  • Cost: $0.05/month
  • Savings: $499.95/month (99.99%)

Even with 30% unique questions:

  • 300,000 API calls
  • Cost: $150/month
  • Savings: $350/month (70%)

🛠️ Configuration

Edit gateway.yaml to customize:

cache:
  redis_url: "redis://localhost:6379"  # Or your Redis URL
  vector:
    enabled: true
    similarity_threshold: 0.85  # 85% similar = cache hit
  ttl_hours: 24  # Cache entries expire after 24 hours

rate_limiter:
  enabled: true
  max_requests: 60  # Per minute per tenant

📡 API Endpoints

Endpoint Method Description
/v1/chat/completions POST Main proxy endpoint with caching
/health GET Health check
/stats GET Cache statistics
/metrics GET Prometheus metrics

🔍 Monitoring

Check Cache Stats

curl http://localhost:8080/stats

Response:

{
  "uptime": 1234567890,
  "cache": {
    "local_index_entries": 150,
    "vector_dimensions": 128,
    "vector_threshold": 0.85,
    "jaccard_threshold": 0.75,
    "template_enabled": true,
    "dedup_enabled": true,
    "ttl_hours": 24
  }
}

Response Headers

Every response includes cache information:

X-Gateway-Cache: HIT          # or MISS
X-Gateway-Similarity: 0.95    # 95% similar (if HIT)
X-Gateway-Time-Saved: 1234ms  # Time saved (if HIT)

🐛 Troubleshooting

Problem: "Redis connection failed"

Solution: Redis is optional! The gateway will fall back to in-memory cache automatically. For production, add Redis:

Railway: Add Redis from the "New" button Render: Add Redis from the "New" → "Database" → "Redis" Docker: Already included in docker-compose.yml

Problem: "All upstream providers unavailable"

Cause: You're hitting rate limits on free tier (Groq/OpenAI)

Solutions:

  1. Wait 1-2 minutes and try again
  2. Upgrade to paid tier ($0.002/request vs free limits)
  3. Add your own API key with higher limits

Problem: "Rate limit exceeded"

Cause: Too many requests from one tenant

Solution: Increase rate limits in gateway.yaml:

rate_limiter:
  max_requests: 120  # Increase from 60
  window_minutes: 1

Problem: Cache not hitting

Cause: Prompts are too different

Solution: Lower the similarity threshold in gateway.yaml:

cache:
  vector:
    similarity_threshold: 0.75  # Lower from 0.85
  jaccard:
    threshold: 0.65  # Lower from 0.75

🏗️ Architecture

Your App → AI Gateway → [Cache Check] → Redis
                ↓
            [Cache HIT] → Return cached response (instant, $0)
                ↓
            [Cache MISS] → Call LLM Provider → Cache response → Return

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repo
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

📄 License

MIT License - feel free to use this commercially!

🙋 Support

⭐ Star History

If this project helps you, please give it a star! It helps others find it.


Built with ❤️ for the AI community

Questions? Open an issue and I'll respond within 24 hours.