惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

P
Privacy International News Feed
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Jina AI
Jina AI
T
Tailwind CSS Blog
WordPress大学
WordPress大学
Scott Helme
Scott Helme
C
Cybersecurity and Infrastructure Security Agency CISA
博客园 - Franky
C
CERT Recently Published Vulnerability Notes
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
雷峰网
雷峰网
Schneier on Security
Schneier on Security
博客园 - 聂微东
T
Tor Project blog
Hugging Face - Blog
Hugging Face - Blog
博客园 - 司徒正美
AI
AI
T
Troy Hunt's Blog
Security Latest
Security Latest
T
The Blog of Author Tim Ferriss
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Check Point Blog
T
Threat Research - Cisco Blogs
W
WeLiveSecurity
V
Vulnerabilities – Threatpost
Recorded Future
Recorded Future
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Cisco Talos Blog
Cisco Talos Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
Cloudbric
Cloudbric
J
Java Code Geeks
罗磊的独立博客
C
Cyber Attacks, Cyber Crime and Cyber Security
aimingoo的专栏
aimingoo的专栏
L
LangChain Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy & Cybersecurity Law Blog
Google DeepMind News
Google DeepMind News
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
L
Lohrmann on Cybersecurity
I
InfoQ
MongoDB | Blog
MongoDB | Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The GitHub Blog
The GitHub Blog
The Hacker News
The Hacker News
H
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
P
Proofpoint News Feed
N
News and Events Feed by Topic

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
I Used Gemma 4 to Simulate an Entire Emergency Command Team -- One Model, Six Roles, Real Doctrine
Kerry Kier · 2026-05-13 · via DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

I work in IT infrastructure for a fire and EMS communications center. I'm also a CERT member. I'm not an Emergency Operations Manager, but I work close enough to that world to understand what tabletop exercises actually cost in time and coordination. Getting six trained ICS personnel into a room at the same time, playing their roles correctly, staying in doctrine, for a discussion-based exercise that might run two hours, that's a significant logistical lift. For smaller agencies or training programs with limited staff, it often just doesn't happen.

That's the gap I wanted to close.

The ICS Tabletop Exercise Simulator is a Gemma 4-powered system that lets an Emergency Operations Manager run a fully staffed ICS tabletop exercise without coordinating a room full of people. The model simultaneously portrays six ICS positions: Incident Commander, Safety Officer, Public Information Officer, Operations Section Chief, Planning Section Chief, and Logistics Section Chief. Every response is grounded in NIMS 2017 doctrine, NQS Position Task Books, and ICS position checklists. Nothing is invented. If a behavior or authority isn't in the doctrine, it doesn't appear in the simulation.

This runs entirely through OpenWebUI with a structured workspace system prompt and a RAG knowledge base containing the official FEMA source documents. There's no custom app, no web development, no agent framework. The interface is a chat window. An EOM describes a scenario, and the simulator responds with every relevant position in ICS format, enforcing chain of command, communication protocols, and position-specific decision authorities.

I want to be direct about what this is. A proof of concept built by someone who supports the infrastructure that emergency management runs on, not by an EOM. I did my best to ground everything in doctrine and had the RAG pipeline pulling from official FEMA documents to keep me honest. But this is a first build, and I'm saying that upfront.

The architecture in one paragraph: A self-hosted server runs OpenWebUI in Docker behind a LiteLLM proxy. The proxy routes inference to the Gemini API for Gemma 4 access. RAG uses ChromaDB for vector storage, bge-m3 for embeddings via local Ollama, and BAAI/bge-reranker-v2-m3 in a TEI container for hybrid search reranking. The knowledge base contains 148 documents converted to clean Markdown: NIMS 2017, NRF 4th Edition, HSEEP 2020, NQS Position Task Books for all six ICS positions, ICS forms, training course manuals, and HSEEP exercise templates.

The behavior that makes it useful:

The system prompt enforces ICS communication protocols precisely, not approximately. The Safety Officer has unilateral stop-work authority without IC approval, because that's what NIMS says. The Planning Section Chief can communicate directly with section chiefs for information gathering, but cannot issue directives. The PIO holds all public messaging for IC approval before release. The OSC and LSC route all coordination through the IC. These rules are pulled directly from the position task books and encoded as hard constraints in the prompt.

The system also implements a source authority hierarchy. NIMS 2017, NQS Position Task Books, and ICS checklists are Tier 1 (authoritative). Course manuals are Tier 2 (supplementary). HSEEP templates are Tier 3 (reference only, not doctrine). When a PTB and a course manual both cover the same content, the PTB is cited. Exercise templates are never cited as doctrine. This hierarchy shapes how the model retrieves and represents source material.

A facilitator command set is built in. An EOM prefixes a message with // to step out of the simulation. Commands include // POSITION QUERY: [position] -- [question] to query a single position directly, // STATUS REPORT to get a one-paragraph status from every position, // DECISION POINT to pause for a structured discussion summary, // UPDATE to add scenario detail without advancing time, and // RESET to clear the scenario. The selective response logic means asking the OSC a direct question returns the IC and OSC only, not six responses when three of them have nothing to say.

Where the build genuinely earns its keep:

Rapid scenario iteration. An EOM can run a full six-position inject response in seconds, adjust the scenario, and run it again. What used to require scheduling six people now happens alone at a desk.

Doctrinal friction. The most valuable learning outcome of a tabletop exercise is when positions conflict, when the SO's stop-work authority collides with the OSC's tactical urgency. The system portrays that friction accurately rather than smoothing it over. In one test, the SO explicitly prevented an interior fire attack citing unverified structural integrity, the OSC escalated the resource gap to the IC, and the IC had to manage both simultaneously. That's the kind of decision-point pressure that makes exercises useful.

Position-specific training. The // POSITION QUERY command lets an EOM ask any position a direct doctrine question mid-exercise. Useful for both exercise facilitation and individual position study.

What already exists in this space:

I checked the market carefully before committing to this. Preppr.ai, EM1, Disaster Tech PRATUS, and Juvare are all serious commercial players in adjacent positions. ThreatGEN AutoTableTop does AI-automated tabletop exercises but for cybersecurity only. None of them do what this does: a single model simulating all six ICS positions, grounded in NQS Position Task Books, for solo practice by a single EOM. Preppr explicitly positions against the solo use case ("exercise design isn't a content problem, it's a coordination problem"). That's either a market gap or a market signal that the use case isn't wanted. I think it's the former, especially for smaller agencies and individual training. The honest framing is that this complements team-oriented platforms rather than competing with them.

Demo

The demo shows a structure fire scenario inject triggering a full six-position ICS response, followed by a // DECISION POINT facilitator command pausing exercise play for structured discussion. The simulation runs entirely in OpenWebUI with no custom app or interface, just a chat window and a system prompt.

Code

All configuration files are in the repository:

https://github.com/kkierii/ics-ttx-simulator

The repo contains:

  • system-prompt.md -- the full OpenWebUI workspace system prompt, including role definitions, communication protocols, source authority hierarchy, facilitator command handling, response format, and behavioral rules
  • config.yaml -- LiteLLM proxy configuration including the Gemma 4 model entry and embedding/reranker routes
  • openwebui-compose.yml -- Docker Compose for OpenWebUI

The system prompt is the primary artifact. It's what took the most iteration and the most doctrine research to get right. The behavior of the simulator lives almost entirely in that one file.

How I Used Gemma 4

I used gemma-4-26b-a4b-it, the 26B Mixture-of-Experts model, accessed via the Gemini API through a LiteLLM proxy.

The model choice wasn't arbitrary. The MoE architecture activates approximately 4B parameters per token while routing through 26B total parameters. For a workload that requires simultaneously holding six distinct role identities with different authorities, communication rules, and knowledge domains, MoE is a better fit than a dense model of equivalent size. A 31B dense model would be slower and more expensive per token with no quality advantage for this specific task. The MoE routing means the model can efficiently specialize per-token, which matters when it's switching between the IC framing incident objectives and the SO assessing stop-work conditions in the same response.

The 26B parameter pool also gives the model enough capacity to maintain doctrinal fidelity across complex multi-position responses. I tested this throughout development by running position-specific queries against the RAG knowledge base and checking results against the source PTBs. The model didn't confuse position authorities. It didn't have OSC making public information decisions. It didn't have LSC tasking Operations. It stayed in lane.

I also chose API deployment over local inference for a specific reason. This is how emergency management agencies and their vendors actually operate. A stack that requires a local GPU capable of running a 26B model puts this out of reach for most small agencies. API deployment, routed through an open-source proxy, means the same system prompt and knowledge base could be moved to a different inference provider or eventually to on-premises deployment as hardware becomes accessible, without changing the application layer.

Now, the parts that didn't go smoothly.

The RAG retrieval ranking problem. Even with the TEI reranker in the stack, course manuals consistently ranked above the authoritative Position Task Books for position-specific queries. The responses were doctrinally correct because the model knows the content, but citations pointed to training course materials rather than PTBs. The reason is semantic. PTBs are written in formal NIMS task language. Course manuals use plain instructional language that maps more naturally to how a question gets phrased. The embedding model scores semantic similarity and the course manuals win on that metric even when the PTBs carry higher authority. I mitigated this with the source authority hierarchy in the system prompt, which influenced the model's citation reasoning but couldn't override the retrieval ranking. The embedding layer runs before the model sees anything. Full resolution would require either a domain-specific embedding model trained on government technical documentation, or a custom reranking approach that weights document metadata. For a prototype this is acceptable. The answers are right. In a production deployment where citation accuracy is a compliance requirement, this is the next thing to solve.

The document conversion step mattered more than expected. Original documents were PDF, DOCX, and PPTX. OpenWebUI's default extractors produced garbled table text from ICS forms, fragmented bullet content from training slides, and merged columns from multi-column doctrine PDFs. Early testing produced one-sentence responses to substantive position queries despite correct source retrieval. After converting everything to clean Markdown using pymupdf4llm for PDFs, python-pptx for slide decks, and python-docx for Word documents, the same queries returned structured multi-point responses with correct form numbers and doctrine citations. The conversion fixed the core retrieval problem before any model tuning was needed.

The thinking loop. During testing I ran into a consistent issue with the most complex injects, specifically scenarios that require all six positions to respond simultaneously with significant doctrinal load, like a firefighter mayday with a stop-work trigger. The model would enter an extended internal reasoning loop, running self-correction passes against the system prompt rules before generating output. In some cases the reasoning ran long enough to hit timeout limits before the response arrived.

I tried several things. Setting reasoning_effort to 0 in OpenWebUI. Adding a budget_tokens cap in the LiteLLM Gemini provider config. Adding a RESPONSE DISCIPLINE block to the system prompt instructing the model to write immediately without pre-checking. Increasing the OpenWebUI client timeout via AIOHTTP_CLIENT_TIMEOUT. None of them fully resolved it for the hardest injects. The thinking loop is collapsible in OpenWebUI and not visible to the EOM by default, so it doesn't break the interface, but a response that times out is a real problem in a live exercise.

I'm not certain whether this is a model behavior issue, a LiteLLM passthrough issue where the reasoning parameters aren't reaching the Gemini API correctly, or something in my own configuration. It may be all three. Simpler injects complete reliably and cleanly. The issue surfaces specifically at maximum complexity, which in a real exercise would be the moments that matter most.

I'm documenting this because someone else building with Gemma 4 in a similar configuration should know it exists. And because pretending a first build has no rough edges doesn't help anyone.

What this project showed me: A single well-structured system prompt with a properly tiered RAG knowledge base can produce doctrinally accurate, role-specific simulation responses that would be genuinely useful for ICS training. The architecture is sound. The limiting factor right now is inference configuration, not the model's capability. When the reasoning is contained to simpler injects, the output quality is exactly what I was hoping for. Phase 2 would add Finance/Administration Section and subordinate positions. The system prompt architecture was explicitly designed for that expansion.

This was my first attempt at building something in this space. I'm an IT infrastructure person who cares about emergency management. I built something that I think has real value, ran into real problems, documented both honestly, and shipped it anyway. That feels about right.