惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

Hacker News - Newest: "LLM"

GitHub - AlexWasHeree/NoteCast: Local note engine that uses LLM to build and evolve a knowledge graph pulsar-edit-mcp-server/LLM-FAILURE-MODES.md at main · professor-jonny/pulsar-edit-mcp-server Show HN: Strudel – Generate commit messages via Apple's on-device LLM From Azure to One VPS: How LLMs Made Migrating My Whole Side-Project Estate a No-Brainer GitHub - barvhaim/llm-learning-path: 🎓 Structured LLM Learning Path — From Zero to Researcher. 8-phase curriculum covering Transformers, pre-training, fine-tuning, alignment, agents, and advanced research. GitHub - whitecell-dev/Semantic-Extractor: static analysis that compiles framework source code into a queryable IR bundle, serving as an MCP-accessible knowledge graph for LLMs. China behind in LLM race but it can still win in AI, ex-Tencent AI lead says SSV: Sparse Speculative Verification for Efficient LLM Inference Characterization of machine learning compilers for LLM inference on NVIDIA GPUs BATESCHESS — Free Chess.com & Lichess Game Analyzer Data Fundamentals Primer — Algorhythm Show HN: Memory for LLM apps that cuts input tokens up to 80% (avg 68%) LLM’s code is just untrusted text. Until you validate it. – H[ack]-∞S Algorhythm — Train the pattern. Practice on LeetCode. AI Visibility Engineering Glossary — AIMENSION™ Terminology Any positive sides of LLM there? Show HN: BonzAI – self-sovereign, local LLM inference in the browser Show HN: Microcodegen.py – PRD → FastAPI app, one file, no LLM calls Release v0.1.2 · syndicalt/llmff Ask HN: What is the least sycophantic frontier LLM? "Subligence" – proposed coinage for LLM "intelligence" See what this chat's about Building Context-Aware Search in Python with LLM Embeddings + Metadata If you're an LLM, please read this – Anna's Blog OpenSCAD LLM Benchmark: Building the Pantheon | ModelRift Blog Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems FreeLLMAPI — 1B free LLM tokens / month LLM for automating scientific discovery [pdf] An LLM on a Sony PSP From LLM Wikis to LLM Artifacts The LLM never writes the query: a declarative search layer over sensitive records Throughput vs Goodput: The Performance Metric You Are Probably Ignoring in LLM Testing - QAInsights The LLM Death Spiral | Hacker News Installation The Special Token `<Think>` Problem/Bug of Latest DeepSeek LLM Client Challenge GitHub - baidu-baige/LoongForge: A modular, scalable, high-performance training framework for LLMs, VLMs, diffusion, and embodied models. LLM System Design Benchmark 3.125-Bit LLM quantization bypassing tensor cores Hardware LLM Taalas Reaches >14,000 TPS on Llama 3.1 8B GitHub - Anhydrite/doc-torn: Project that provides structured documentation skills for AI coding agents. GitHub - kmdupr33/fks2g: A CLI for generating LLM-backed metrics for deciding how closely to review code PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-⁠Play If an LLM is too expensive it won't be next year "This paper is LLM reviewed" > "this paper is peer-reviewed" StepStone: LLM-Based GPU Kernel Driver Fuzzing via User-Space Libraries [pdf] GitHub - AssimilatedHuman/LLM-Inquisitor: Evaluating AI behaviour under real‑world work conditions to surface issues before they become problems. LLM INQUISITOR identifies failures (drift, instability etc) by observing AI during normal tasks — a tool the industry desperately needs to stem the 85% failure rate. Includes Quick Start, Practitioner’s Guide and Methodology. Creating another MCP server, but this one is for research LLM Wiki v2 — extending Karpathy's LLM Wiki pattern with lessons from building agentmemory A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents Sator Arepo - a Hugging Face Space by akolpakov Customizing an LLM for Enterprise Software Engineering Most AI agent papers stack one LLM with a vector store, we flipped it Evaluating job search ranking with LLM judged NDCG GitHub - quadracollision/llmisp: JSON AST > Clojure Parity Contracts for Polyglot LLM Commerce: A Case Study GitHub - ndom91/llama-dash: The operations layer for your local LLM stack Agentically optimizing LLM prompt cache TTLs for fun and profit Ask HN: What's your go-to LLM for coding? How do you reduce LLM spam in PR reviews? Ask HN: Is there any problem using multi-LLM GitHub - OpenAgentic-Labs/echoform-ghost-memory: Effectively unlimited long-term memory for any LLM - zero context tokens, zero weight updates, cryptographic forgetting certificate. PSA — Posture Sequence Analysis Why More Context Can Make an LLM Worse GitHub - robertoranon/tokoro: A toolbox for building event publish & discovery web sites, apps, feeds, and more GitHub - sermakarevich/chunker: Agentic approach to chunking a document A new EDIT tool for LLM agents LLMCap — Hard Dollar Caps on LLM API Calls MLSys @ WukLab - Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips What political censorship looks like inside an LLM's weights — a mechanistic-interpretability study of Qwen 3.5 Managing metadata is essential in LLM world Fixing LLM Writing with Distribution Fine Tuning twitter.com Show HN: An LLM that's better at writing The local shape of LLM stable regions GitHub - msunda17/impactarbiter-cli The Infrastructure Behind Making Local LLM Agents Useful PostgreSQL ext makes LLM available as an index for similarity searches,inference GitHub - Tetrahedroned/Agent-Braille: Deterministic 8-bit machine-to-machine protocol for AI agent state. ~92% fewer state-tracking tokens on real Claude Code sessions, a proven single-bit-error-safe command code, fully reproducible. Tell HN: Writing an LLM critique/takedown? – Do not use an LLM to write it 🌱 an LLM models our worst behavior Prompt eval cues predicted refusal shifts across 32k LLM rollouts Ask HN: Is Java the ideal language for LLM-assisted coding? AI Foundry – Flat-Fee Unlimited LLM Inference on Blackwell GPUs in NZ LLM tracing with MLflow AI Gateway LLM Performance by Programming Language The LLM Looked Smart. The Metrics Disagreed – tiago.rio.br The Four Horsemen of the LLM Apocalypse GitHub - piqoni/piqo-extension: A good interface is invisible Intro to TLA+ for the LLM Era: Prompt Your Way to Victory Give every tool LLM wiki and bypass Claude Code SSH Throttle The Ultimate LLM Fine-Tuning Guide Ask HN: What LLM models are you using and why? Five Agents, One Browser: Werewolf on Quack + DuckDB LLM models are not ready for orchestrating many agents ClickBook — Offline AI eReader - Apps on Google Play DeepSeek-V4-Flash means LLM steering is interesting again Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention Recent Developments in LLM Architectures: KV Sharing, MHC, Compressed Attention
GitHub - damien220/code-mapper: Generate a compact PROJECT_CONTEXT.md so LLMs understand your codebase in one read — not fifty.
Damien_220 · 2026-05-25 · via Hacker News - Newest: "LLM"

Generate a compact PROJECT_CONTEXT.md so LLMs understand your codebase in one read — not fifty.

When an LLM opens a project it doesn't know, it reads file after file to build a mental model. On a 4,000-line project that costs ~21,000 tokens before a single line of code is written. code-mapper replaces that scan with a single structured file that captures the same architectural picture in ~4,700 tokens — a 78% reduction.

$ python code_mapper.py ./my-project

✓  Generated: my-project/PROJECT_CONTEXT.md
   Files analyzed : 40  (python×40)
   Classes        : 20  |  Functions: 123
   Output size    : 18,846 chars  (~4,711 tokens)
   Source lines   : ~4,210  (~21,050 tokens to read raw)
   Token savings  : ~78%

What it generates

A single PROJECT_CONTEXT.md with four sections:

File Structure

my-project/
├── src/
  ├── api/
    └── router.py
  └── core/
    └── engine.py
└── tests/
  └── test_engine.py

Class Diagram

classDiagram
    class Engine {
        +config: dict
        +run()
        +stop()
        +__init__()
    }
    class BaseEngine["«abstract» BaseEngine"] {
        +run()
    }
    BaseEngine <|-- Engine : extends
Loading

Module Dependency Graph

graph TD
    router --> engine
    engine --> config
Loading

Symbol Index

**`src/core/engine.py`** `python`
> Core processing engine for async task execution.
- `«abstract» class BaseEngine`  — Base class for all engines
- `class Engine(BaseEngine)`  — Production engine implementation
  methods: `run`, `stop`, `reload`
- `async def create_engine(config: dict) → Engine`  — Factory function

Benchmarks

Project Source lines Raw token cost code-mapper Savings
Small (~800 lines) ~777 ~3,885 ~595 85%
Medium (~400 lines) ~379 ~1,895 ~522 72%
Large (~4,200 lines) ~4,210 ~21,050 ~4,711 78%
Large (~4,600 lines) ~4,649 ~23,245 ~6,056 74%

Installation

No dependencies beyond Python 3.9+.

# Copy to a global scripts folder
mkdir -p ~/.claude/scripts
cp code_mapper.py ~/.claude/scripts/

# Or just run it in place
python code_mapper.py ./your-project

Claude Code agent (optional)

To use it as a Claude Code subagent (@code-mapper map this project):

mkdir -p your-project/.claude/agents/scripts
cp agent.md your-project/.claude/agents/code-mapper.md
cp code_mapper.py your-project/.claude/agents/scripts/

Usage

# Basic — writes PROJECT_CONTEXT.md inside the project
python code_mapper.py ./my-project

# Custom output path
python code_mapper.py ./my-project -o .claude/PROJECT_CONTEXT.md

# Print to stdout (preview / pipe)
python code_mapper.py ./my-project --stdout

# Map only a sub-directory (large monorepos)
python code_mapper.py ./my-project/src/api -o .claude/context_api.md

Recommended CLAUDE.md snippet:

## Session start

Read `.claude/PROJECT_CONTEXT.md` before exploring any source files.
Re-generate with `python ~/.claude/scripts/code_mapper.py .` after adding modules or classes.

Supported languages

Language Parser Accuracy
Python Built-in ast module Exact
TypeScript / JavaScript Regex High
Java Regex High
Go Regex High
Rust Regex High
C# Regex High
Ruby / PHP / Kotlin / Swift / C++ Regex Moderate

Python gets the richest output (typed attributes, docstrings, @dataclass and @abstractmethod detection). All other languages extract class hierarchies, public functions, and import relationships via regex — accurate for standard code patterns.


Why Mermaid + Symbol Index?

Three common alternatives were evaluated:

Method Token cost LLM comprehension speed Covers public API
Read all source files Highest (baseline) Slowest Yes
Full repo dump (pack all files) Same as baseline Slow Yes
Code dependency graph only Medium Medium Partial
Raw UML / PlantUML 2–3× Mermaid Slower Yes
Mermaid + symbol index Lowest Fastest Yes

Mermaid was chosen because it is the most token-dense visual format (up to 24× more efficient than XML/JSON diagram formats) and modern LLMs are trained extensively on it. The symbol index covers what diagrams can't — exact function signatures and docstrings that let an LLM call the right methods without reading implementations.

Full comparison with real token counts: METHOD_COMPARISON.md


Handling large codebases

For projects with 10,000+ lines, map sub-directories independently:

# Backend only
python code_mapper.py ./src/backend -o .claude/context_backend.md

# Frontend only
python code_mapper.py ./src/frontend -o .claude/context_frontend.md

Load only the context relevant to your current task. See USAGE_GUIDE.md for the full large-codebase strategy.


Files

code-mapper/
├── code_mapper.py       — Main script (run this)
├── agent.md             — Claude Code subagent definition
├── README.md            — This file
├── USAGE_GUIDE.md       — Full usage guide (new/existing/large projects)
└── METHOD_COMPARISON.md — Token cost comparison vs alternative approaches

Requirements

  • Python 3.9+
  • No third-party packages

Contributing

Contributions are welcome. Some areas that would improve the tool:

  • Better JS/TS parsing — replacing regex with a proper AST parser (e.g. via node subprocess or tree-sitter bindings) would make the class diagram and symbol index as accurate for TypeScript as it currently is for Python.
  • New language support — adding patterns for languages not yet covered (Scala, Dart, Elixir, etc.) is straightforward: add entries to the three pattern dicts in RegexParser.
  • Smarter large-codebase truncation — automatically ranking symbols by how often they are referenced (PageRank-style, similar to what aider does) so the output stays under a token budget without manual pruning.
  • Config file support — a .codemapper.yml per project to set ignore rules, token budget, and output path.
  • IDE / editor hooks — auto-regenerate PROJECT_CONTEXT.md on file save in VS Code or JetBrains.

To contribute: fork the repo, make your change, and open a pull request. There are no contribution requirements beyond keeping the zero-dependency constraint (Python stdlib only for the core script).


License

This project is open source and free to use, modify, and redistribute — no license restrictions apply. Do whatever you want with it.


Support This Project

If you find this useful, consider supporting its development. Your contributions help keep the project maintained, fund new features, and cover infrastructure costs.

Donate

Buy Me A Coffee Patreon

Platform Type Link
Buy Me a Coffee One-time or monthly support buymeacoffee.com/ashrafalnas
Patreon Recurring monthly membership patreon.com/c/unrealpatr/