GitHub - damien220/code-mapper: Generate a compact PROJECT_CONTEXT.md so LLMs understand your codebase in one read

Generate a compact PROJECT_CONTEXT.md so LLMs understand your codebase in one read — not fifty.

When an LLM opens a project it doesn't know, it reads file after file to build a mental model. On a 4,000-line project that costs ~21,000 tokens before a single line of code is written. code-mapper replaces that scan with a single structured file that captures the same architectural picture in ~4,700 tokens — a 78% reduction.

$ python code_mapper.py ./my-project

✓  Generated: my-project/PROJECT_CONTEXT.md
   Files analyzed : 40  (python×40)
   Classes        : 20  |  Functions: 123
   Output size    : 18,846 chars  (~4,711 tokens)
   Source lines   : ~4,210  (~21,050 tokens to read raw)
   Token savings  : ~78%

What it generates

A single PROJECT_CONTEXT.md with four sections:

File Structure

my-project/
├── src/
  ├── api/
    └── router.py
  └── core/
    └── engine.py
└── tests/
  └── test_engine.py

Class Diagram

classDiagram
    class Engine {
        +config: dict
        +run()
        +stop()
        +__init__()
    }
    class BaseEngine["«abstract» BaseEngine"] {
        +run()
    }
    BaseEngine <|-- Engine : extends

Module Dependency Graph

graph TD
    router --> engine
    engine --> config

Symbol Index

**`src/core/engine.py`** `python`
> Core processing engine for async task execution.
- `«abstract» class BaseEngine`  — Base class for all engines
- `class Engine(BaseEngine)`  — Production engine implementation
  methods: `run`, `stop`, `reload`
- `async def create_engine(config: dict) → Engine`  — Factory function

Benchmarks

Project	Source lines	Raw token cost	code-mapper	Savings
Small (~800 lines)	~777	~3,885	~595	85%
Medium (~400 lines)	~379	~1,895	~522	72%
Large (~4,200 lines)	~4,210	~21,050	~4,711	78%
Large (~4,600 lines)	~4,649	~23,245	~6,056	74%

Installation

No dependencies beyond Python 3.9+.

# Copy to a global scripts folder
mkdir -p ~/.claude/scripts
cp code_mapper.py ~/.claude/scripts/

# Or just run it in place
python code_mapper.py ./your-project

Claude Code agent (optional)

To use it as a Claude Code subagent (@code-mapper map this project):

mkdir -p your-project/.claude/agents/scripts
cp agent.md your-project/.claude/agents/code-mapper.md
cp code_mapper.py your-project/.claude/agents/scripts/

Usage

# Basic — writes PROJECT_CONTEXT.md inside the project
python code_mapper.py ./my-project

# Custom output path
python code_mapper.py ./my-project -o .claude/PROJECT_CONTEXT.md

# Print to stdout (preview / pipe)
python code_mapper.py ./my-project --stdout

# Map only a sub-directory (large monorepos)
python code_mapper.py ./my-project/src/api -o .claude/context_api.md

Recommended CLAUDE.md snippet:

## Session start

Read `.claude/PROJECT_CONTEXT.md` before exploring any source files.
Re-generate with `python ~/.claude/scripts/code_mapper.py .` after adding modules or classes.

Supported languages

Language	Parser	Accuracy
Python	Built-in `ast` module	Exact
TypeScript / JavaScript	Regex	High
Java	Regex	High
Go	Regex	High
Rust	Regex	High
C#	Regex	High
Ruby / PHP / Kotlin / Swift / C++	Regex	Moderate

Python gets the richest output (typed attributes, docstrings, @dataclass and @abstractmethod detection). All other languages extract class hierarchies, public functions, and import relationships via regex — accurate for standard code patterns.

Why Mermaid + Symbol Index?

Three common alternatives were evaluated:

Method	Token cost	LLM comprehension speed	Covers public API
Read all source files	Highest (baseline)	Slowest	Yes
Full repo dump (pack all files)	Same as baseline	Slow	Yes
Code dependency graph only	Medium	Medium	Partial
Raw UML / PlantUML	2–3× Mermaid	Slower	Yes
Mermaid + symbol index	Lowest	Fastest	Yes

Mermaid was chosen because it is the most token-dense visual format (up to 24× more efficient than XML/JSON diagram formats) and modern LLMs are trained extensively on it. The symbol index covers what diagrams can't — exact function signatures and docstrings that let an LLM call the right methods without reading implementations.

Full comparison with real token counts: METHOD_COMPARISON.md

Handling large codebases

For projects with 10,000+ lines, map sub-directories independently:

# Backend only
python code_mapper.py ./src/backend -o .claude/context_backend.md

# Frontend only
python code_mapper.py ./src/frontend -o .claude/context_frontend.md

Load only the context relevant to your current task. See USAGE_GUIDE.md for the full large-codebase strategy.

Files

code-mapper/
├── code_mapper.py       — Main script (run this)
├── agent.md             — Claude Code subagent definition
├── README.md            — This file
├── USAGE_GUIDE.md       — Full usage guide (new/existing/large projects)
└── METHOD_COMPARISON.md — Token cost comparison vs alternative approaches

Requirements

Python 3.9+
No third-party packages

Contributing

Contributions are welcome. Some areas that would improve the tool:

Better JS/TS parsing — replacing regex with a proper AST parser (e.g. via node subprocess or tree-sitter bindings) would make the class diagram and symbol index as accurate for TypeScript as it currently is for Python.
New language support — adding patterns for languages not yet covered (Scala, Dart, Elixir, etc.) is straightforward: add entries to the three pattern dicts in RegexParser.
Smarter large-codebase truncation — automatically ranking symbols by how often they are referenced (PageRank-style, similar to what aider does) so the output stays under a token budget without manual pruning.
Config file support — a .codemapper.yml per project to set ignore rules, token budget, and output path.
IDE / editor hooks — auto-regenerate PROJECT_CONTEXT.md on file save in VS Code or JetBrains.

To contribute: fork the repo, make your change, and open a pull request. There are no contribution requirements beyond keeping the zero-dependency constraint (Python stdlib only for the core script).

License

This project is open source and free to use, modify, and redistribute — no license restrictions apply. Do whatever you want with it.

Support This Project

If you find this useful, consider supporting its development. Your contributions help keep the project maintained, fund new features, and cover infrastructure costs.

Donate

Platform	Type	Link
Buy Me a Coffee	One-time or monthly support	buymeacoffee.com/ashrafalnas
Patreon	Recurring monthly membership	patreon.com/c/unrealpatr/

推荐订阅源

Hacker News - Newest: "LLM"