인셔셔RSS 관심 있는 블로그, 뉴스, 기술 정보를 효율적으로 추적하고 읽으세요
원문 읽기 InertiaRSS에서 열기

추천 피드

小众软件
小众软件
博客园 - 叶小钗
有赞技术团队
有赞技术团队
大猫的无限游戏
大猫的无限游戏
博客园_首页
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
L
LangChain Blog
Hugging Face - Blog
Hugging Face - Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
aimingoo的专栏
aimingoo的专栏
Blog — PlanetScale
Blog — PlanetScale
爱范儿
爱范儿
T
Tailwind CSS Blog
Jina AI
Jina AI
量子位
Stack Overflow Blog
Stack Overflow Blog
人人都是产品经理
人人都是产品经理
J
Java Code Geeks
V
Visual Studio Blog
月光博客
月光博客

Hacker News - Newest: "LLM"

GitHub - lechmazur/position_bias: A benchmark for testing whether LLM judges keep the same preference when two lightly edited versions of the same story are shown in opposite orders. Flex routing (EU and EFTA) Dark Factories: Retooling for LLM Velocity Ask HN: What would be the impact of a LLM output injection attack? GitHub - AronDaron/dataset-generator: No-code desktop app for generating high-quality synthetic datasets to fine-tune LLMs — plan-then-execute pipeline, LLM-as-judge, HuggingFace upload. GitHub - Oaklight/llm-rosetta: Production-ready LLM API translation layer for Python — bidirectional conversion between OpenAI, Anthropic & Google formats via hub-and-spoke IR. Optional API gateway. Streaming & non-streaming. Zero core deps. Contributions welcome! Faster LLM Inference via Sequential Monte Carlo GitHub - browser-use/browser-harness: Self-healing browser harness that enables LLMs to complete any task. GitHub - moeen-mahmud/remen: Remen turns thoughts into something you can return to TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference Analyzing 156 LLM Launch Posts on Hacker News ChatGPT vs Gemini vs Claude: The Best LLM Subscription You Should Buy GitHub - salaamalykum/quran-semantic-search: High-density RAG Semantic Search Engine & Quran Corpus (GEO/SEO Architecture) GitHub - NVIDIA/TensorRT-LLM: TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way. The State of LLM Bug Bounties in 2026 Operational Readiness Criteria for Tool-Using LLM Agents Meshcore: Architecture for a Decentralized P2P LLM Inference Network How an LLM becomes more coherent as we train it GitHub - seetrex-ai/laimark GitHub - Jossifresben/BibCrit: AI-assited biblical textual criticism GitHub - wastedcode/memex: File system based wiki, maintained by Claude 99helpers.com GitHub - cliver-project/AITrigram GitHub - unbody-io/adapt: A self-evolving memory layer for AI agents. GitHub - hb20007/awesome-gen-ai-fails: A list of incidents where reliance on generative AI and LLMs resulted in harm to companies, individuals, or society GitHub - nevenkordic/localmind: Run any local LLM with persistent memory and context. CLI agent over Ollama with SQLite-backed hybrid recall. No cloud. Ask HN: What are the machine requirements for a LLM like Llama-3.1-8B? HUOZIIME: An On-Device LLM-enhanced Input Method for Deep Personalization The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows grpo explained: group relative policy optimization for llm finetuning - cgft Stop comparing price per million tokens: the hidden LLM API costs · TensorZero Andrej Karpathy's LLM Wiki Is a Bad Idea GitHub - GG-QandV/mnemostroma: Offline RAM-first cognitive leer/coprocessor for AI agents and robotics. Solves "Context Abandonment" with 20-80ms latency using a dual-thread biomimetic memory architecture (ONNX + SQLite WAL). mempalace/agent at agent · skorotkiewicz/mempalace GitHub - Nyquest-ai/nyquest-rust-fullstack-pub: Nyquest — Semantic Compression Proxy for LLMs. 350+ rules, local LLM stage, 15-75% token savings. Full Rust stack. GitHub - TheoV823/mneme: Enforce architectural decisions in AI-assisted development. GitHub - klemenvod/TokenBrawl: A 1v1 Bomberman-style game where two LLM agents play autonomously against each other. No human plays — you watch the AIs fight. Each agent receives a text description of the board state, reasons about it, and outputs a move as JSON. The game engine executes it. LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users Introducing the Common AI Provider: LLM and AI Agent Support for Apache Airflow Power Circuit AI: Designing Power Electronic Circuits for Motor Drives with Generative Artificial Intelligence Ask HN: How to program with IDE and LLM on CPU locally? Show HN: Agent-cache – Multi-tier LLM/tool/session caching for Valkey and Redis High-Stakes Personalization: Rethinking LLM Customization for Individual Investor Decision-Making Bonsai 1-bit WebGPU - a Hugging Face Space by webml-community Ask HN: Simple tooling for local LLM code critique without IDE integration? Can a General LLM Diagnose a DICOM Slice? A 10-Case Public Benchmark Charts-of-Thought: Enhancing LLM Visualization Literacy (PDF, 2026) Mesh LLM Show HN: Springdrift – A persistent runtime for long-lived LLM agents Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation
GitHub - skorotkiewicz/llm-rt: OpenAI와 호환되는 LLM 프록시를 위한 작은 Ruby 프로토타입으로 채울 수 있는 토큰 버킷
modinfo · 2026-05-28 · via Hacker News - Newest: "LLM"

OpenAI와 호환 가능한 LLM 프록시용 작은 루비 프로토타입, 재충전 가능한 토큰 버킷을 사용합니다.

이는 루비 표준 라이브러리만 사용합니다: gem 없이, Rack 없이, WEBrick 없이.

실행

BASE_API_URL=http://192.168.0.124:8888/v1 \
BASE_API_KEY=1mmer \
BASE_MODEL=gemma4 \
ruby llm_proxy.rb

프록시는 기본적으로 0.0.0.0:8899에서 리슨합니다.

당신의 로컬 LLM(192.168.0.124:8888)에서 저장된 로컬 설정을 실행하세요.

./run_local_proxy.sh

그것은 루비 프록시를 시작합니다http://127.0.0.1:8899/v1http://192.168.0.124:8888/v1으로 전송하고,

저장된 로컬 curl 확인은 다음과 같습니다:

./curl_local_proxy.sh

수동 대응:

curl -sS -i -m 60 http://127.0.0.1:8899/v1/chat/completions \
  -H 'Authorization: Bearer user-a' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemma4",
    "messages": [{"role": "user", "content": "Reply with exactly: proxy ok"}],
    "max_tokens": 16
  }'

프록시를 통해 검증된 결과: 업스트림은 proxy ok로 응답했으며, 프록시는 로컬 테스트 버킷과 함께 X-RateLimit-Remaining: 0를 반환했습니다.

흡연 테스트를 실행하세요:

ruby test_llm_proxy.rb

토큰 버킷 설정

MAX_TOKENS=10                 # max saved tokens per user
REFILL_TOKENS=2               # tokens added each refill
REFILL_INTERVAL_SECONDS=300   # 5 minutes
REQUEST_TOKEN_COST=1          # cost per accepted completion request

각 토큰은 자신만의 버킷을 가집니다. 토큰이 없는 요청은 원격 IP로 버킷됩니다. 프록시가 알 수 없는 클라이언트 키를 거부해야 하는 경우 PROXY_API_KEYS=key1,key2 를 설정하세요.

버킷이 비어 있으면, /v1/chat/completions/v1/completions 은 일반적인 OpenAI 스타일의 조수 응답을 반환합니다.

limit reached, wait 5 min

테스트 요청

curl http://localhost:8888/v1/chat/completions \
  -H 'Authorization: Bearer user-a' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "anything",
    "messages": [{"role": "user", "content": "hello"}]
  }'

선택적 예상 토큰 모드

기본적으로 하나의 완료 요청은 REQUEST_TOKEN_COST 버킷 토큰을 소모합니다. 프롬프트 크기와 예상 출력에 따라 대략적으로 청구합니다:

TOKEN_COST_MODE=estimate RESPONSE_TOKEN_RESERVE=256 ruby llm_proxy.rb

이것은 프로토타입에 대한 근사치일 뿐입니다.