GitHub - skorotkiewicz/llm-rt: 小型Ruby原型，用于OpenAI兼容的LLM代理，具有可补充的令牌桶 - 慣性聚合

推薦訂閱源

博客园 - 叶小钗

有赞技术团队

大猫的无限游戏

博客园_首页

钛媒体：引领未来商业与生活新知

Hugging Face - Blog

OSCHINA 社区最新新闻

aimingoo的专栏

Blog — PlanetScale

Tailwind CSS Blog

Stack Overflow Blog

人人都是产品经理

Java Code Geeks

Visual Studio Blog

Hacker News - Newest: "LLM"

GitHub - lechmazur/position_bias: A benchmark for testing whether LLM judges keep the same preference when two lightly edited versions of the same story are shown in opposite orders. Flex routing (EU and EFTA) Dark Factories: Retooling for LLM Velocity Ask HN: What would be the impact of a LLM output injection attack? GitHub - AronDaron/dataset-generator: No-code desktop app for generating high-quality synthetic datasets to fine-tune LLMs — plan-then-execute pipeline, LLM-as-judge, HuggingFace upload. GitHub - Oaklight/llm-rosetta: Production-ready LLM API translation layer for Python — bidirectional conversion between OpenAI, Anthropic & Google formats via hub-and-spoke IR. Optional API gateway. Streaming & non-streaming. Zero core deps. Contributions welcome! Faster LLM Inference via Sequential Monte Carlo GitHub - browser-use/browser-harness: Self-healing browser harness that enables LLMs to complete any task. GitHub - moeen-mahmud/remen: Remen turns thoughts into something you can return to TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference Analyzing 156 LLM Launch Posts on Hacker News ChatGPT vs Gemini vs Claude: The Best LLM Subscription You Should Buy GitHub - salaamalykum/quran-semantic-search: High-density RAG Semantic Search Engine & Quran Corpus (GEO/SEO Architecture) GitHub - NVIDIA/TensorRT-LLM: TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way. The State of LLM Bug Bounties in 2026 Operational Readiness Criteria for Tool-Using LLM Agents Meshcore: Architecture for a Decentralized P2P LLM Inference Network How an LLM becomes more coherent as we train it GitHub - seetrex-ai/laimark GitHub - Jossifresben/BibCrit: AI-assited biblical textual criticism GitHub - wastedcode/memex: File system based wiki, maintained by Claude 99helpers.com GitHub - cliver-project/AITrigram GitHub - unbody-io/adapt: A self-evolving memory layer for AI agents. GitHub - hb20007/awesome-gen-ai-fails: A list of incidents where reliance on generative AI and LLMs resulted in harm to companies, individuals, or society GitHub - nevenkordic/localmind: Run any local LLM with persistent memory and context. CLI agent over Ollama with SQLite-backed hybrid recall. No cloud. Ask HN: What are the machine requirements for a LLM like Llama-3.1-8B? HUOZIIME: An On-Device LLM-enhanced Input Method for Deep Personalization The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows grpo explained: group relative policy optimization for llm finetuning - cgft Stop comparing price per million tokens: the hidden LLM API costs · TensorZero Andrej Karpathy's LLM Wiki Is a Bad Idea GitHub - GG-QandV/mnemostroma: Offline RAM-first cognitive leer/coprocessor for AI agents and robotics. Solves "Context Abandonment" with 20-80ms latency using a dual-thread biomimetic memory architecture (ONNX + SQLite WAL). mempalace/agent at agent · skorotkiewicz/mempalace GitHub - Nyquest-ai/nyquest-rust-fullstack-pub: Nyquest — Semantic Compression Proxy for LLMs. 350+ rules, local LLM stage, 15-75% token savings. Full Rust stack. GitHub - TheoV823/mneme: Enforce architectural decisions in AI-assisted development. GitHub - klemenvod/TokenBrawl: A 1v1 Bomberman-style game where two LLM agents play autonomously against each other. No human plays — you watch the AIs fight. Each agent receives a text description of the board state, reasons about it, and outputs a move as JSON. The game engine executes it. LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users Introducing the Common AI Provider: LLM and AI Agent Support for Apache Airflow Power Circuit AI: Designing Power Electronic Circuits for Motor Drives with Generative Artificial Intelligence Ask HN: How to program with IDE and LLM on CPU locally? Show HN: Agent-cache – Multi-tier LLM/tool/session caching for Valkey and Redis High-Stakes Personalization: Rethinking LLM Customization for Individual Investor Decision-Making Bonsai 1-bit WebGPU - a Hugging Face Space by webml-community Ask HN: Simple tooling for local LLM code critique without IDE integration? Can a General LLM Diagnose a DICOM Slice? A 10-Case Public Benchmark Charts-of-Thought: Enhancing LLM Visualization Literacy (PDF, 2026) Mesh LLM Show HN: Springdrift – A persistent runtime for long-lived LLM agents Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation

GitHub - skorotkiewicz/llm-rt: 小型Ruby原型，用于OpenAI兼容的LLM代理，具有可补充的令牌桶

modinfo · 2026-05-28 · via Hacker News - Newest: "LLM"

此乃一简Ruby原型，为OpenAI兼容之LLM代理，具可补注之令牌桶。

唯用Ruby标准库：无宝石，无Rack，无WEBrick。

运行

BASE_API_URL=http://192.168.0.124:8888/v1 \
BASE_API_KEY=1mmer \
BASE_MODEL=gemma4 \
ruby llm_proxy.rb

该代理默认监听于0.0.0.0:8899。

欲为本地LLM于192.168.0.124:8888，则运行所存本地设置：

./run_local_proxy.sh

此启Ruby代理于http://127.0.0.1:8899/v1 递至 http://192.168.0.124:8888/v1.

所存本地 curl 检查如次：

./curl_local_proxy.sh

手动等价：

curl -sS -i -m 60 http://127.0.0.1:8899/v1/chat/completions \
  -H 'Authorization: Bearer user-a' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemma4",
    "messages": [{"role": "user", "content": "Reply with exactly: proxy ok"}],
    "max_tokens": 16
  }'

经代理验证：上游应答以 proxy ok，代理返 X-RateLimit-Remaining: 0 于本地测试桶.

运烟试：

ruby test_llm_proxy.rb

令桶之设

MAX_TOKENS=10                 # max saved tokens per user
REFILL_TOKENS=2               # tokens added each refill
REFILL_INTERVAL_SECONDS=300   # 5 minutes
REQUEST_TOKEN_COST=1          # cost per accepted completion request

每持证令牌自成一篓。无持证令牌之请，则按远地IP分篓。设PROXY_API_KEYS=key1,key2若代理拒斥未知客键。

桶空之时，/v1/chat/completions且/v1/completions返常之OpenAI式助应

limit reached, wait 5 min

测试之请

curl http://localhost:8888/v1/chat/completions \
  -H 'Authorization: Bearer user-a' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "anything",
    "messages": [{"role": "user", "content": "hello"}]
  }'

可選之估計詞元模式

常情之下，一应请求需费REQUEST_TOKEN_COST桶之符。欲依提示之量与预期之果计费：

TOKEN_COST_MODE=estimate RESPONSE_TOKEN_RESERVE=256 ruby llm_proxy.rb

此乃原型之粗略估算耳。

此內容由慣性聚合(RSS閱讀器)自動聚合整理，僅供閱讀參考。原文來自 — 版權歸原作者所有。