慣性聚合 高效追蹤和閱讀你感興趣的部落格、新聞、科技資訊
閱讀原文 在慣性聚合中打開

推薦訂閱源

小众软件
小众软件
博客园 - 叶小钗
有赞技术团队
有赞技术团队
大猫的无限游戏
大猫的无限游戏
博客园_首页
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
L
LangChain Blog
Hugging Face - Blog
Hugging Face - Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
aimingoo的专栏
aimingoo的专栏
Blog — PlanetScale
Blog — PlanetScale
爱范儿
爱范儿
T
Tailwind CSS Blog
Jina AI
Jina AI
量子位
Stack Overflow Blog
Stack Overflow Blog
人人都是产品经理
人人都是产品经理
J
Java Code Geeks
V
Visual Studio Blog
月光博客
月光博客

Hacker News - Newest: "LLM"

GitHub - lechmazur/position_bias: A benchmark for testing whether LLM judges keep the same preference when two lightly edited versions of the same story are shown in opposite orders. Flex routing (EU and EFTA) Dark Factories: Retooling for LLM Velocity Ask HN: What would be the impact of a LLM output injection attack? GitHub - AronDaron/dataset-generator: No-code desktop app for generating high-quality synthetic datasets to fine-tune LLMs — plan-then-execute pipeline, LLM-as-judge, HuggingFace upload. GitHub - Oaklight/llm-rosetta: Production-ready LLM API translation layer for Python — bidirectional conversion between OpenAI, Anthropic & Google formats via hub-and-spoke IR. Optional API gateway. Streaming & non-streaming. Zero core deps. Contributions welcome! Faster LLM Inference via Sequential Monte Carlo GitHub - browser-use/browser-harness: Self-healing browser harness that enables LLMs to complete any task. GitHub - moeen-mahmud/remen: Remen turns thoughts into something you can return to TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference Analyzing 156 LLM Launch Posts on Hacker News ChatGPT vs Gemini vs Claude: The Best LLM Subscription You Should Buy GitHub - salaamalykum/quran-semantic-search: High-density RAG Semantic Search Engine & Quran Corpus (GEO/SEO Architecture) GitHub - NVIDIA/TensorRT-LLM: TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way. The State of LLM Bug Bounties in 2026 Operational Readiness Criteria for Tool-Using LLM Agents Meshcore: Architecture for a Decentralized P2P LLM Inference Network How an LLM becomes more coherent as we train it GitHub - seetrex-ai/laimark GitHub - Jossifresben/BibCrit: AI-assited biblical textual criticism GitHub - wastedcode/memex: File system based wiki, maintained by Claude 99helpers.com GitHub - cliver-project/AITrigram GitHub - unbody-io/adapt: A self-evolving memory layer for AI agents. GitHub - hb20007/awesome-gen-ai-fails: A list of incidents where reliance on generative AI and LLMs resulted in harm to companies, individuals, or society GitHub - nevenkordic/localmind: Run any local LLM with persistent memory and context. CLI agent over Ollama with SQLite-backed hybrid recall. No cloud. Ask HN: What are the machine requirements for a LLM like Llama-3.1-8B? HUOZIIME: An On-Device LLM-enhanced Input Method for Deep Personalization The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows grpo explained: group relative policy optimization for llm finetuning - cgft Stop comparing price per million tokens: the hidden LLM API costs · TensorZero Andrej Karpathy's LLM Wiki Is a Bad Idea GitHub - GG-QandV/mnemostroma: Offline RAM-first cognitive leer/coprocessor for AI agents and robotics. Solves "Context Abandonment" with 20-80ms latency using a dual-thread biomimetic memory architecture (ONNX + SQLite WAL). mempalace/agent at agent · skorotkiewicz/mempalace GitHub - Nyquest-ai/nyquest-rust-fullstack-pub: Nyquest — Semantic Compression Proxy for LLMs. 350+ rules, local LLM stage, 15-75% token savings. Full Rust stack. GitHub - TheoV823/mneme: Enforce architectural decisions in AI-assisted development. GitHub - klemenvod/TokenBrawl: A 1v1 Bomberman-style game where two LLM agents play autonomously against each other. No human plays — you watch the AIs fight. Each agent receives a text description of the board state, reasons about it, and outputs a move as JSON. The game engine executes it. LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users Introducing the Common AI Provider: LLM and AI Agent Support for Apache Airflow Power Circuit AI: Designing Power Electronic Circuits for Motor Drives with Generative Artificial Intelligence Ask HN: How to program with IDE and LLM on CPU locally? Show HN: Agent-cache – Multi-tier LLM/tool/session caching for Valkey and Redis High-Stakes Personalization: Rethinking LLM Customization for Individual Investor Decision-Making Bonsai 1-bit WebGPU - a Hugging Face Space by webml-community Ask HN: Simple tooling for local LLM code critique without IDE integration? Can a General LLM Diagnose a DICOM Slice? A 10-Case Public Benchmark Charts-of-Thought: Enhancing LLM Visualization Literacy (PDF, 2026) Mesh LLM Show HN: Springdrift – A persistent runtime for long-lived LLM agents Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation
GitHub - skorotkiewicz/llm-rt: 一個小型的 Ruby 原型,用於開發與 OpenAI 兼容的 LLM 代理,並配備可補充的 token 桶
modinfo · 2026-05-28 · via Hacker News - Newest: "LLM"

小型紅寶石原型,用於OpenAI相容的LLM代理程式,具有可補充的token桶.

它只使用Ruby標準函庫:沒有gem,沒有Rack,沒有WEBrick.

執行

BASE_API_URL=http://192.168.0.124:8888/v1 \
BASE_API_KEY=1mmer \
BASE_MODEL=gemma4 \
ruby llm_proxy.rb

代理程式預設監聽在0.0.0.0:8899.

對於您本地的LLM在192.168.0.124:8888,執行儲存的本地設定:

./run_local_proxy.sh

這會啟動Ruby代理程式在http://127.0.0.1:8899/v1 和轉發到 http://192.168.0.124:8888/v1.

儲存的本地 curl 檢查是:

./curl_local_proxy.sh

手動等價:

curl -sS -i -m 60 http://127.0.0.1:8899/v1/chat/completions \
  -H 'Authorization: Bearer user-a' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemma4",
    "messages": [{"role": "user", "content": "Reply with exactly: proxy ok"}],
    "max_tokens": 16
  }'

透過代理伺服器驗證結果:上游回應了 proxy ok,而代理伺服器回傳了 X-RateLimit-Remaining: 0,並使用本地測試儲存桶。

執行煙測試:

ruby test_llm_proxy.rb

令牌儲存桶設定

MAX_TOKENS=10                 # max saved tokens per user
REFILL_TOKENS=2               # tokens added each refill
REFILL_INTERVAL_SECONDS=300   # 5 minutes
REQUEST_TOKEN_COST=1          # cost per accepted completion request

每個攜帶權杖的令牌都獲得它自己的桶。沒有攜帶權杖的請求會根據遠程 IP 分桶。如果代理應該拒絕未知客戶密鑰,請設置PROXY_API_KEYS=key1,key2

當桶為空時,/v1/chat/completions/v1/completions會返回一個正常的 OpenAI �風格的助手響應:

limit reached, wait 5 min

測試請求

curl http://localhost:8888/v1/chat/completions \
  -H 'Authorization: Bearer user-a' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "anything",
    "messages": [{"role": "user", "content": "hello"}]
  }'

選擇性估計令牌模式

預設情況下,一次完成請求耗費REQUEST_TOKEN_COST桶代幣。為根據提示大小加上預期輸出大致收費:

TOKEN_COST_MODE=estimate RESPONSE_TOKEN_RESERVE=256 ruby llm_proxy.rb

這僅是原型的一個近似值。