Graph-based Code Generation Exploration
In this repo I explore the idea of generating software implementation via a structured
graph decomposition. The system consists of 3 phases. The planner the plangraph and
the codegenerator. The planner is a simple LLM loop to gather user requirements by
prompting the user with clarifying questions until the model judges that it has adequate
information about the user's intents.
In the plangraph phase, the user's plan is an input to a graph builder. Each graph is
a recursive decomposition of the plan into individual self-contained components. A
component can have inline responsibilities and delegated responsibilities with the latter
determining whether a component is a leaf or an intermediate node in the graph.
The theory was that we can use a graph structure to better cement the LLM code generation
and allow it to focus on one system at a time without requiring a "whole-project" context.
The plangraph is an alternative to the typical LLM workflow of markdown planning
documents which are prompt-injected into an LLM to generate the desired outcome.
Results
The plangraph module actually worked reasonably well despite my initial misgivings. The
agent is able to recursively decompose problems in what seems like a reasonable manner in
my tests. The repo includes an example for a simple tui calculator app plangraph. This
is generated from an example plan defined in main.py.
For the actual code generation - the coder agent and orchestrator agent both work pretty
well at their assigned tasks. In particular I was rather impressed by how well the coder
could generate functional components from a plangraph node description. It really
reinforced my thesis that structured guardrails and precise deterministic tooling is the
right way to utilize agents. That being said - it does sometimes get stuck on outdated
syntax and test-failing loops, though this can be chalked up to a rather haphazard
implementation of a web search tool. In my tests I found that precise documentation lookup
is extremely important to a successful generation, rather similarly to how humans spend
time googling.
The orchestrator does its job though there isn't much fanfare with it. Its a good way
to guide the graph generation process in a semi-deterministic way. It also facilitates
regeneration and user feedback which is needed for a generative workflow.
The scaffolding around my code generation is also rather raw and relies on a global pyproject structure - but this isn't meant to be anything more than a research/learning experience.
Conclusion
The idea I had is "proven" in a sense - but there are some serious drawbacks. The first is that my iterative generation approach requires a lot of tokens. A single pass for the 5 component calculator app runs for over an hour. I am sure a lot of this is due to LLM latency and my usage of cheap models - but it is both time and token inefficient to run this type of decomposition.
Whether the code is that much better than a dedicated claude-code/codex session can generate, I don't know, probably not. But it could potentially be more maintainable over the long term because of the plangraph structure which was the real idea/star of the show.
























