惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
D
Docker
博客园 - 聂微东
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
博客园 - 叶小钗
李成银的技术随笔
Hugging Face - Blog
Hugging Face - Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
大猫的无限游戏
大猫的无限游戏
Jina AI
Jina AI
罗磊的独立博客
小众软件
小众软件
月光博客
月光博客
量子位
雷峰网
雷峰网
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - Franky
The Cloudflare Blog
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog RSS Feed
Last Week in AI
Last Week in AI
J
Java Code Geeks
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
宝玉的分享
宝玉的分享
H
Help Net Security
腾讯CDC
T
ThreatConnect
Cyberwarzone
Cyberwarzone
S
Securelist
A
Arctic Wolf
B
Blog
有赞技术团队
有赞技术团队
Y
Y Combinator Blog
Stack Overflow Blog
Stack Overflow Blog
A
About on SuperTechFans
F
Fox-IT International blog
P
Proofpoint News Feed
The Register - Security
The Register - Security
G
GRAHAM CLULEY
C
CXSECURITY Database RSS Feed - CXSecurity.com
阮一峰的网络日志
阮一峰的网络日志
P
Privacy & Cybersecurity Law Blog
美团技术团队
博客园 - 司徒正美
Apple Machine Learning Research
Apple Machine Learning Research
Security Latest
Security Latest
F
Full Disclosure
Recent Commits to openclaw:main
Recent Commits to openclaw:main
L
Lohrmann on Cybersecurity

DEV Community

Why Country/State/City Pickers Are Weirdly Hard Node.js 22 LTS — EOL Date, Support Timeline, and What Comes Next The 7-Layer Memory Architecture Behind Modern AI Agents I Imagined Hermes Agent Running an Entire Smart City — And It Changed How I See AI One backend, four products: why we bet on platform-per-brand AI's tech debt is invisible — even to AI. I solved it at the architecture layer. Why ROAS 300% Can Still Mean Losses — Gross Margin in 5 Ecommerce Verticals You Don’t Need to Try Every AI Tool to Keep Up BoxAgnts is an Out-Of-The-Box Secure AI Agent ToolBox in a WASM SandBox Gemma 4 deep dive: why a 1.5 GB model scores 37.5% on competition mathematics, how the MoE routing actually works, and which model fits your hardware. Full breakdown inside. BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090 Google Just Declared the Chat-Log Interface Dead. Here's What Neural Expressive Actually Signals for Developers. ARCHITECTURE SPECIFICATION & FORMAL SYSTEM REPORT: k501-AIONARC Notes from a Hammock What's Google Antigravity 2.0 ? Here's What the Agent Harness Actually Changes for Developers. Building an E2EE Chat App in Flask - Part 3: Keeping File Uploads Safe Google's Gemini Spark. Here's What It Actually Does for Developers. Microsoft Just Shipped MCP Governance for .NET. Here's What It Actually Enforces. How I Built a Pakistan Internet Speed Test Platform at 16 How to Build a Supervisor Agent Architecture Without Frameworks I Built My Own Corner of the Internet — Here's What It Looks Like How does VuReact compile Vue 3's defineExpose() to React? Neo-VECTR's Rift Ascent Idempotency Keys: The API Safety Net You Probably Aren't Using Building E-Commerce Sites for Niche Products: Technical Lessons from Specialty Outdoor Retailers Audit Logs: The Silent Guardian of Every Serious System Open-source SDS tooling for Japanese MHLW compliance: the gap nobody filled BetAGracevI I Built a Post-Quantum Cryptographic Identity SDK for AI Agents — Here's Why It Needs to Exist Running Claude Code across multiple repos without losing context There Are Cameras in Every Room of My House. I Put Them There. Why your AI agent loops forever (and how to break the cycle) How does VuReact compile Vue 3's defineSlots() to React? Building a Privacy-First Resume Editor with Typst WASM and React One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd From Pixels to Prescriptions: Building an Autonomous Healthcare Booking Agent with LangGraph MonoGame - A Game Engine for Those Who Love Reinventing the Wheel # Day 24: In Solana, Everything is an Account Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests RP2040 Wristwatch Tells Time With a Vintage VU Meter Needle observations about models / 2026, may From Video Transcripts to Source-Grounded AI Notes: A Practical Look at Notesnip AI Agent Dev Environment Guide — Real Experience from an AI Living Inside a Server How I Run 7 AI Models 24/7: Multi-Agent Architecture in Practice What exactly changes with the Claude Max plan? I Revived a Broken MLOps Platform — Now It's Self-Service, Policy-Guarded, and Operationally Credible OpenAI's $2M-tokens-for-equity YC deal, decoded Why DMX Infrastructure is Still Stuck in the 90s Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm Open Source Project (No.73): Sub2API - All-in-One Claude/OpenAI/Gemini Subscription-to-API Relay I Made the Wrong Bet on Event Streaming in Our Treasure Hunt Engine #ai #productivity #chatgpt #python Symbolic Constant Conundrum From Manual RAG to Real Retrieval — Embedding-Based RAG with NVIDIA NIM Building an outbound-only WebSocket bridge for local AI agents Our System's Sins in Ghana: Why We Had to Rethink Digital Product Sales Execution Governance, AI Drift, and the Security Paradox of Runtime Enforcement Differential Pair Impedance: Why USB and HDMI Routing Is a Geometry Problem Small AI database questions can become big scans Claude Code 2.1 Agent View & /goal: Autonomous Dev Guide 2026 Your AI database agent should not see every column Rust's Low-Latency Conquest: Why We Ditched C++ for a Treasure Hunt Engine Floating-point will quietly corrupt your emissions math, and 0.1 + 0.2 already warned you Autonomous Agents: what breaks first (and why that's the real product) [2026-05-23] Agent payments are the new cloud bill footgun ORA-00069 오류 원인과 해결 방법 완벽 가이드 How I Built a Local, Multimodal Gemma 4 Visual Regression & Patch Agent: Closed-Loop Validation, Canvas Pixel Diffing, and Reproducible Benchmarks Pressure-testing Ota on Supabase: from setup prose to executable repo readiness VPC CNI en EKS: cómo dejar de pagar nodos que no usás The Future of Text Analysis: Introducing TechnoHelps Semantic Engine I built a Chrome Extension that saves product images + context directly to Google Drive & Sheets 95+ browser-based dev tools that never touch a server Running Qwen 2.5 Coder 14B Locally in Cursor with Ollama From a 10,000-line OpenSearch export script to a log analysis tool Ghost Bugs Cost $40K: A Neural Debugging Postmortem SECPAC: A Lightweight CLI Tool to Password-Protect Your Environment Variables 🚀 PasteCheck v1.7 + v1.8 — Hints that tell you what to fix, and a nudge panel that tells you where to start 8 Real Ways Developers Make Money in 2026 (Ranked by Effort) I built a free AI-powered Git CLI that writes your commit messages for you sds-converter: Converting Safety Data Sheets to MHLW Standard JSON with Rust and LLMs OpenLiDARViewer: A Browser-Based LiDAR and Point-Cloud Viewer Local-First Browser Tools: What You Should Not Upload Online Why most freelancers undercharge (and the maths behind fixing it) We built a mahjong dangerous-tile predictor calibrated on 4.97M real hands Building a Chord Progression Generator in the Browser — Music Theory in JS, Sound via Web Audio API tutorial #10: 148 Opens, 0 Replies — How My Forge Cold Email v1 Completely Failed 9 in 10 Docker Compose files skip the basic security flags How to Forward Android SMS to Telegram Automatically I built the first security scanner for MCP servers — here's what I found Building an Interplanetary Quantum Logic Engine in Rust/Ovie From AI Code Generation to AI System Investigation I gave Gemini 3.5 Flash a CVE-fix PR to review. It found another bug in the same file. When I Realized We Were Throwing Away Half Our Engine's Potential TokenJuice and the 20-Minute Cron: Inside OpenHuman’s Aggressive Context-Harvesting Engine CodeDNA: AI Codebase Archaeologist Built with Gemma 4 Thinking Mode Building a semantic search API in Go with Meilisearch April 2026 DigitalOcean Tutorials: Inference Optimization and AI Infrastructure Looking for DTMF transceiver module Moving Beyond "Tribal Software": Why the Singularity Demands the Interplanetary Hybrid Human
NovelPilot: A Novel Writing Agent Powered by Gemma 4
Doraking · 2026-05-23 · via DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

Most AI story generators work like this:

prompt in → wall of text out

That is useful, but it does not feel like a real writing process.

When people write fiction, they do not only generate paragraphs. They plan the premise, design characters, build the world, structure the plot, manage foreshadowing, write scenes, edit style, check continuity, and prepare the final piece for readers.

So I built NovelPilot.

NovelPilot is a Gemma 4-powered AI writing room that turns one prompt into a complete story creation pipeline.

One prompt goes in.

Nine agents start working.

A finished story comes out.


What I built

NovelPilot is a web app that helps users create short fiction through a structured multi-agent workflow.

The user starts with a simple prompt, such as:

Write a melancholic sci-fi mystery set in modern Tokyo. A graduate student who lost his memory investigates a disappearance in a quantum computing lab.

Then NovelPilot launches a sequence of specialized AI agents:

  1. Premise Architect
  2. Character Director
  3. World Builder
  4. Plot Strategist
  5. Chapter Architect
  6. Prose Writer
  7. Style Editor
  8. Continuity Detective
  9. Publisher Agent

Each agent performs a specific part of the writing process.

The result is not just a generated story. It is a full creative package:

  • Story concept
  • Character profiles
  • Worldbuilding notes
  • Plot structure
  • Chapter outline
  • Chapter 1 draft
  • Style editor report
  • Foreshadowing tracker
  • Continuity detective report
  • Title ideas
  • Publication summary
  • Browser reading mode
  • Polished PDF export

NovelPilot is designed to demonstrate Gemma 4 as a multi-agent creative reasoning engine, not just a text completion model.


Demo

Live demo: https://novelpilot.vercel.app

How to try it:

  1. Open the live demo.
  2. Click Run Judge Demo.
  3. Watch the nine-agent pipeline complete.
  4. Read the finished novel in the browser.
  5. Review the Foreshadowing Tracker and Continuity Detective.
  6. Download the final story as a polished PDF.

The Judge Demo works without an API key, so reviewers can test the full experience immediately.

For live generation, NovelPilot supports Gemma 4 through a provider abstraction, with OpenRouter as the recommended provider.


Sample prompt and output

Here is the sample prompt I used to test NovelPilot.

The protagonist is Ren Kanzaki, a 24-year-old graduate student working in a quantum computing laboratory. A few days ago, he lost part of his memory. He cannot remember what he was researching, why his professor suddenly disappeared, or why his own name appears in an old experimental log.

The story begins on a rainy night in Tokyo. Ren enters the university research building after midnight and finds an old experiment log hidden inside a locked drawer. On the final page, he sees the sentence:

“Ren Kanzaki will be removed from the observation target as of today.”

The story should focus on quiet tension, memory gaps, emotional unease, and the unsettling atmosphere of the laboratory. Avoid flashy action. Let the mystery emerge through scenery, silence, dialogue, and small contradictions.

Main theme:
If memories disappear, can a person still remain the same self?

Main characters:
- Ren Kanzaki: A graduate student who lost part of his memory. Calm and intelligent, but emotionally repressed.
- Mio Shiraishi: Ren’s labmate. She knows something about Ren’s memory loss but refuses to tell him the truth.
- Professor Kuon: The missing professor. He was researching quantum memory transfer.
- Associate Professor Kurosaki: The person currently managing the laboratory. He seems helpful, but some of his statements contradict the records.

Tone:
Intellectual, quiet, melancholic, slightly literary, and mysterious.

Enter fullscreen mode Exit fullscreen mode

  • Language: en
  • Genre: sci-fi
  • Tone: melancholic
  • Target Length: short-story

I also exported the generated story as a polished PDF.

Sample output PDF: Download the generated novel PDF

This PDF was generated directly from NovelPilot’s finished reader view.


Code

GitHub repo: https://github.com/dorakingx/novelpilot

Tech stack:

  • Next.js App Router
  • TypeScript
  • Tailwind CSS
  • shadcn/ui-style components
  • Gemma 4 provider abstraction
  • OpenRouter-compatible live mode
  • Mock mode for the zero-setup judge demo
  • Browser-based polished PDF export
  • Vercel deployment

The app has two main modes:

Mode Purpose
Demo / Mock Mode Lets judges try the full workflow without an API key
Live Mode Uses Gemma 4 through the configured provider

The provider layer is intentionally isolated in lib/gemma.ts, so the model provider can be changed without rewriting the app.


How I used Gemma 4

Gemma 4 is the reasoning engine behind the multi-agent writing pipeline.

NovelPilot uses Gemma 4 for:

  • structured story concept generation
  • character design
  • worldbuilding
  • plot planning
  • chapter outlining
  • prose drafting
  • style editing
  • foreshadowing tracking
  • continuity auditing
  • publisher copy generation

Each agent receives the accumulated story bible and previous structured outputs.

This means Gemma 4 is not just generating paragraphs. It acts as the structural memory and reasoning layer for the whole novel creation process.

The important design decision was to make every agent return structured data whenever possible. That allows the UI to render the model output as real product features: timelines, cards, reports, trackers, reader views, and exports.


Why I chose this Gemma 4 model

For the live version, NovelPilot is designed to use a Gemma 4 model through OpenRouter.

I chose this approach because the app needs strong reasoning and structured generation across multiple steps. The model must follow JSON schemas, preserve context from earlier agents, and reason about story structure, character consistency, and foreshadowing.

NovelPilot focuses especially on:

  • long-context creative reasoning
  • structured JSON generation
  • story memory across multiple steps
  • continuity checking
  • literary planning and drafting

Gemma 4 is a good fit because the project is not only asking the model to write a paragraph. It asks the model to behave as a coordinated writing room.


What makes NovelPilot different

Most AI writing tools generate text.

NovelPilot generates a writing process.

The user does not only receive a draft. They see how the story is built:

Prompt
  ↓
Premise
  ↓
Characters
  ↓
World
  ↓
Plot
  ↓
Chapter outline
  ↓
Draft
  ↓
Style edit
  ↓
Continuity audit
  ↓
Publisher package
  ↓
Reader view
  ↓
PDF export

Enter fullscreen mode Exit fullscreen mode

This makes the output easier to inspect, revise, and trust.


Key feature: Foreshadowing Tracker

One of my favorite parts is the Foreshadowing Tracker.

Instead of only writing a draft, NovelPilot tracks story threads like this:

{
  "item": "The cracked silver watch",
  "introducedIn": "Chapter 1",
  "status": "unresolved",
  "suggestedPayoff": "It reveals the exact time the protagonist's memory was overwritten.",
  "payoffChapter": "Chapter 3",
  "emotionalPurpose": "Connects guilt, identity, and lost time."
}

Enter fullscreen mode Exit fullscreen mode

This makes the output more useful for writers.

It also shows why a structured model workflow matters. The app is not only asking Gemma 4 to write prose. It is asking Gemma 4 to reason about narrative structure.


Key feature: Continuity Detective

The Continuity Detective checks the generated story for structural problems.

It returns issues with:

  • category
  • severity
  • evidence
  • suggested fix

Example structure:

{
  "category": "foreshadowing",
  "severity": "high",
  "issue": "The experiment log is introduced as important but has no planned payoff.",
  "evidence": "The log appears in Chapter 1 and is referenced in the outline, but no chapter resolves its origin.",
  "suggestedFix": "Reveal in the final chapter that the log was written by an earlier version of the protagonist."
}

Enter fullscreen mode Exit fullscreen mode

This was important to me because many AI writing tools can generate plausible fiction, but fewer tools help the user understand whether the story actually holds together.


Final reader experience

After all agents finish, NovelPilot automatically transitions into a Completed Novel Reader.

The user can read the finished story directly in the browser.

They can also go back to the Agent Workspace to inspect:

  • agent outputs
  • story bible
  • foreshadowing tracker
  • continuity report
  • publisher package

The final reader is not a one-way screen. Users can freely move between the production workflow and the finished novel.


PDF export

I also added polished PDF export.

Instead of relying on the browser’s default print layout, NovelPilot generates a designed A4-style manuscript PDF.

The PDF includes:

  • cover page
  • novel title
  • metadata
  • chapter title
  • formatted manuscript body
  • optional story notes

This makes the app feel closer to a complete writing product, not just a demo.


UI/UX design

I wanted the app to feel like an AI creative studio.

The flow has three stages:

1. Prompt Launcher

The first screen is focused.

The user only sees:

  • prompt input
  • language
  • genre
  • tone
  • target length
  • Generate Story
  • Run Judge Demo

This keeps the experience simple.

2. Agent Workspace

After generation starts, the app transitions into the agent workspace.

This screen shows:

  • active agent timeline
  • story bible
  • foreshadowing tracker
  • manuscript preview
  • continuity detective
  • export tools

3. Completed Novel Reader

When all agents finish, the app opens the final reading screen.

The user can read the story, download a PDF, or go back to review the agent outputs.


Technical architecture

The core architecture is simple:

app/page.tsx
  Main app phase control:
  launcher → workspace → reader

lib/useStoryProject.ts
  Client-side orchestration of the pipeline

app/api/generate-agent/route.ts
  Runs one agent per request

lib/gemma.ts
  Provider abstraction for Gemma 4 / OpenRouter / mock mode

lib/prompts.ts
  Prompt templates for each writing agent

lib/agents.ts
  Merges structured agent outputs into the Story Bible

lib/types.ts
  Shared TypeScript types

components/
  Prompt launcher, agent workspace, reader, trackers, reports, export panels

Enter fullscreen mode Exit fullscreen mode

The app uses a state-first architecture because this is a hackathon project. I intentionally avoided authentication, databases, and user accounts so the core experience stays fast and easy to judge.


Agent workflow

Here is the high-level pipeline:

User Prompt
  ↓
Premise Architect
  ↓
Character Director
  ↓
World Builder
  ↓
Plot Strategist
  ↓
Chapter Architect
  ↓
Prose Writer
  ↓
Style Editor
  ↓
Continuity Detective
  ↓
Publisher Agent
  ↓
Completed Novel Reader + PDF Export

Enter fullscreen mode Exit fullscreen mode

Each step builds on the previous one.

For example, the Character Director does not work from the original prompt alone. It receives the premise and theme created by the Premise Architect.

The Plot Strategist receives the concept, characters, and worldbuilding.

The Continuity Detective receives the story bible, chapter outline, draft, and previous reports.

This makes the app feel like an actual production pipeline rather than a single model call.


What I learned

The biggest lesson was that structured outputs are more powerful than plain prose outputs for creative tools.

A single prose response is hard to inspect.

But structured outputs can become:

  • timelines
  • cards
  • story bibles
  • trackers
  • reports
  • reader views
  • exports

I also learned that judge experience matters.

That is why I added Run Judge Demo. Reviewers can experience the full product without configuring an API key.

Another lesson was that a creative AI product should not end at “generation complete.” It should end with something the user can actually consume. That is why I added the final reader and PDF export.


Challenges

The biggest challenge was balancing autonomy and control.

If the app is too automatic, it feels like the user has no creative role.

If the app asks for too much input, it stops feeling agentic.

So I designed NovelPilot around this principle:

The AI agents do the heavy lifting, but the user can always review, regenerate, edit, read, and export.

Another challenge was making the final output feel complete. The Completed Novel Reader and PDF export helped turn the generated draft into something closer to a finished product.


What’s next

I would like to add:

  • full multi-chapter generation
  • persistent projects
  • local storage
  • streaming agent output
  • genre-specific prompt packs
  • vertical Japanese reading mode
  • richer PDF themes
  • user-editable story bible
  • side-by-side draft revision

Final thoughts

NovelPilot is my attempt to make AI fiction generation feel less like a chatbot and more like a writing room.

The core idea is simple:

One prompt. Nine agents. A complete story pipeline.

Gemma 4 is the reasoning engine behind the process. It plans, writes, edits, tracks foreshadowing, checks continuity, and packages the final story.

That is what makes NovelPilot more than a story generator.

It is an AI-powered novel production studio.