How We Built Dynamic NPC Dialogue with LLMs
We're a small team at Vantage Digital Labs building AI tooling for game developers. Our first product is an NPC dialogue engine powered by LLMs — and we've been running it in early access for a few months now. Here's what we've learned.
The Problem
Traditional NPC dialogue is written by hand. Every line, every branch, every response to every possible player input. For a small studio making an RPG with 50 NPCs, that's thousands of lines of dialogue — and it's all static.
What if NPCs could respond dynamically? What if a merchant could actually react to what the player says, instead of cycling through 3 pre-written lines?
Our Architecture
We went with a simple but effective pipeline:
Player Input → Context Builder → LLM API → Response Parser → Game Engine
↑ |
└──── Memory / State ──────────┘
Context Builder — Injects the NPC's personality, location, knowledge, and recent conversation history into a system prompt.
LLM API — We started with GPT-4o-mini, then tested DeepSeek and Qwen. For cost-sensitive indie games, smaller models work surprisingly well if the prompt is good.
Response Parser — Extracts the dialogue text plus metadata like emotion tags ([emotion:happy]) and action tags ([action:wave]).
Memory — A simple relevance-scored store that lets NPCs "remember" past interactions.
What Actually Matters
After running this for a few months, here's what we found:
1. System Prompt Engineering > Model Size
A well-crafted system prompt with a 7B model beats a generic prompt with GPT-4. We spend more time on personality definitions and context injection than on model selection.
You are Goron, a friendly dwarven merchant who loves haggling.
Location: Marketplace
You know about: prices, rare items, local rumors
Respond in character. Keep replies under 3 sentences.
Short, specific, constrained. That's it.
2. Response Parsing is Underrated
LLMs are chatty. Games need structured output. We use simple tag extraction:
const emotionMatch = raw.match(/\[emotion:(\w+)\]/i);
const actionMatch = raw.match(/\[action:([^\]]+)\]/i);
const text = raw.replace(/\[(emotion|action):[^\]]*\]/gi, '').trim();
This gives us clean dialogue text plus metadata for animation triggers.
3. Latency Matters More Than Quality
Players won't wait 3 seconds for an NPC to respond. We target <500ms total latency. This means:
- Streaming responses (display text as it generates)
- Smaller models for non-critical NPCs
- Aggressive caching of common responses
4. Conversation History Windowing
Sending the full conversation history is expensive and slow. We window to the last 10 exchanges, with a separate memory system for important facts.
if (history.length > 20) history.splice(0, 2);
Simple, effective, cheap.
Cost Reality Check
For a game with 1000 daily active players, each talking to 5 NPCs per session:
- GPT-4o-mini: ~$2-5/day
- DeepSeek V3: ~$0.50-1/day
- Self-hosted 7B: ~$0 (on existing game server)
For indie games, the economics work. It's not free, but it's cheaper than hiring a dialogue writer for every language.
Open Questions We're Still Working On
- Consistency — How do you keep an NPC's personality stable across thousands of conversations?
- Multilingual — Supporting 5+ languages without maintaining 5x the prompts
- Voice — Combining LLM dialogue with real-time TTS (we're experimenting with this)
Try It
We have a live demo on our website where you can talk to NPCs powered by our engine. It's running a real inference backend, not canned responses.
If you're building a game and want to experiment with AI NPCs, we're in early access and happy to chat.
Vantage Digital Labs builds AI tooling for game teams. vantage-digital.online




















