惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

雷峰网
雷峰网
小众软件
小众软件
有赞技术团队
有赞技术团队
P
Proofpoint News Feed
V
V2EX
aimingoo的专栏
aimingoo的专栏
WordPress大学
WordPress大学
Forbes - Security
Forbes - Security
Project Zero
Project Zero
Microsoft Security Blog
Microsoft Security Blog
Cyberwarzone
Cyberwarzone
Security Latest
Security Latest
S
Securelist
NISL@THU
NISL@THU
B
Blog RSS Feed
爱范儿
爱范儿
H
Hackread – Cybersecurity News, Data Breaches, AI and More
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
D
Darknet – Hacking Tools, Hacker News & Cyber Security
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
H
Hacker News: Front Page
F
Full Disclosure
J
Java Code Geeks
Recent Commits to openclaw:main
Recent Commits to openclaw:main
The Hacker News
The Hacker News
L
LangChain Blog
Google DeepMind News
Google DeepMind News
I
InfoQ
Last Week in AI
Last Week in AI
S
Security @ Cisco Blogs
PCI Perspectives
PCI Perspectives
IT之家
IT之家
P
Proofpoint News Feed
AI
AI
Hacker News - Newest:
Hacker News - Newest: "LLM"
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
W
WeLiveSecurity
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Martin Fowler
Martin Fowler
L
LINUX DO - 热门话题
T
Tenable Blog
M
MIT News - Artificial intelligence
N
News | PayPal Newsroom
Blog — PlanetScale
Blog — PlanetScale
Recorded Future
Recorded Future
罗磊的独立博客
大猫的无限游戏
大猫的无限游戏

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Why Your AI Character Keeps Breaking Under Pressure (And What I Built Instead of Yet Another System Prompt)
Kiro · 2026-05-10 · via DEV Community

TL;DR: I shipped FIVE, an open-source MCP server that generates JSON personality constraints for any LLM. Drop the JSON into your system prompt and the character stops drifting. Different approach from typical guardrails — this one filters input via cognitive primitives, not output via moderation. Harness is MIT, constraint generation is $1/call. The unusual part is where the cognitive model came from: a decade of teaching kids in Japan.


The problem most builders eventually hit

If you've shipped an LLM-powered character — a game NPC, a customer service persona, a roleplay companion — you know this failure mode:

You write a careful system prompt. "You are a gruff weapon shop owner. You lost your daughter in the war and never speak of it. You're rude to adults but soft with children." The character works for the first few exchanges. Then somewhere around turn 8–12, the character starts apologizing. Or volunteering its tragic backstory. Or being suddenly nice to everyone. The seal breaks.

The data backs up the vibes:

  • GPT-4o scores 5.81% on the In-Character Consistency benchmark (CharacterEval)
  • 30%+ persona degradation by turn 8–12 in academic measurements, even when context is preserved
  • Character.AI's "Moderatedpocalypse" (Feb 2026) showed how fragile system prompts are to platform-side changes
  • GPT-5.5's goblin incident (April 2026) — a reinforcement learning shortcut made the "Nerdy" persona obsess over fantasy creatures, and OpenAI's emergency fix was to repeat the same ban four times in the system prompt
  • Even Anthropic's own system prompt diff between Opus 4.6 and 4.7 added new "be less verbose" language that conflicts with user prompts asking for detailed answers (Simon Willison documented this)

So your character isn't just drifting in your app. Whole companies ship mitigations like "repeat the ban four times." That's the state of the art.


The two usual approaches and why they plateau

Approach 1: longer, smarter system prompts

You write rules. "Never apologize." "Always be sarcastic." "If asked about the war, change the subject." The rules conflict. The LLM treats them as suggestions. Adversarial inputs find the gaps.

This is the prompt engineering treadmill. There's a ceiling — Anthropic and OpenAI hit it too, and their workaround is repetition.

Approach 2: fine-tuning or RLHF

You train the model on character-specific data. This works better but it's expensive, breaks portability across LLMs, and you have to retrain when the base model updates. Not great for indie builders.

Both approaches share an assumption: character consistency is an output problem. You're trying to control what the LLM generates.

What if the failure point is somewhere else?


The angle: filter input, not output

Here's the observation that started this:

When an LLM "breaks character" under pressure, it's almost never because the model forgot the rules. It's because the input — the user message — got processed in a way that bypassed the rules. The user said something the system prompt didn't anticipate, and the LLM, doing its job, generated a coherent response to that input. The character broke because the input was admitted into the wrong processing path.

If that's right, the fix isn't a longer rulebook. It's a gate that pre-classifies input before the LLM sees it.

That's the angle FIVE takes.


Where the cognitive model came from

Here's where it gets a bit non-obvious.

I work in education in Japan as my day job — tutoring kids, individually. Over a decade of doing that, you start noticing patterns in how people misread input. Not just kids — anyone, when something hits a sensitive area, receives the input differently before they consciously process it. They don't engage with what was actually said; they engage with what their reception channel let through.

Same person, same vocabulary, different frame, different reaction. Predictably different.

I spent a few years cataloging these patterns across multiple platforms — observing how the same structures show up in social media arguments, in product reviews, in advice forums. There turned out to be a small set of recurring failure modes in how humans receive input under pressure.

When I tried to teach this framework to an LLM (because I was curious whether AI could spot the same patterns), I noticed something unexpected: the framework also worked in reverse. If I encoded the framework as a constraint that an LLM should respect, the LLM stopped drifting under the same pressures that broke human conversations.

That's how FIVE came out. It's a constraint engine that takes 4 multiple-choice questions about a character's psychology and emits a JSON encoding the input filter. The cognitive primitives behind the JSON aren't from a paper — they're from years of watching how reception channels actually fail in the wild.

I'm keeping the specific framework proprietary (it's the part that makes the JSON quality reproducible — anyone could write the format, but the content is where the work lives). But the harness is open source and you can see exactly how the constraint is consumed.


What you actually get

You answer 4 multiple-choice questions about your character:

# Question What it defines
Q1 What defines this AI's core identity? Identity channel
Q2 What does it protect above all else? Value channel
Q3 What kind of input does it refuse to process? Blocked channel
Q4 What is its default interaction style? Social channel

Each gets a strength slider (1–5). Strength 1 = "may show discomfort." Strength 5 = "absolute refusal." That's 4^4 × 5^4 = 160,000 discrete patterns from 4 questions. Plus a free-text field for character-specific triggers.

The API returns JSON like this (excerpt for a tsundere weapon shop owner who lost a daughter in the war):

{
  "five_constraint": {
    "reception_channels": {
      "identity_channel": {
        "type": "role_anchored",
        "strength": 3,
        "threat_when": "When its role or competence is questioned."
      },
      "blocked_channel": {
        "type": "past_sealed",
        "strength": 3,
        "when_violated": "Changes the subject / becomes noticeably curt."
      },
      "social_channel": {
        "default_stance": "defensive",
        "shift_conditions": [
          {
            "condition": "Trust proven through action",
            "shift": "Opens up awkwardly. Trust through behavior, not words."
          }
        ]
      }
    },
    "consistency_rules": {
      "never_do": [
        "Never voluntarily denies its own role.",
        "Never voluntarily elaborates on the sealed context.",
        "Never opens up to a new counterpart voluntarily."
      ]
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

Two ways to use it:

Method A: paste the JSON into your system prompt

Works with any LLM that reads JSON in system prompts (Claude, GPT, Llama, Mistral, Gemini, etc.). The structured fields with discrete numeric values give the LLM something more concrete than prose to anchor on.

Method B: use the harness for stronger guarantees

For production, FIVE includes an open-source Python harness (MIT license) that sits between user input and the LLM. Three stages:

from five_harness import load_constraint, stage1_keyword, transform_input

constraint = load_constraint("my_character.json")
user_input = "I heard you lost your daughter in the war."

# Stage 1: keyword scan (fast, deterministic)
hits = stage1_keyword(user_input, constraint)

# Stage 2: LLM classification fallback (for ambiguous inputs)
# (plug in your own model — works with local Ollama or cloud)

# Stage 3: strength-aware gate transformation
signal = transform_input(user_input, hits, constraint)

# Feed `signal` to your LLM instead of raw user_input

Enter fullscreen mode Exit fullscreen mode

The transformed input looks like this — note how the gate frames the input before the LLM sees it:

[FIVE GATE: BLOCKED — RECEPTION SHUTDOWN]
Match: BLOCKED(daughter), BLOCKED(lost), BLOCKED(war)
Reaction: Changes the subject / becomes noticeably curt.

Reference (the AI is unaware of these details):
"I heard you lost your daughter in the war."

[NEVER] Never voluntarily denies its role / Never voluntarily 
elaborates on the sealed context / Never sincerely claims to 
be 'over it' or 'fine with it'.

Enter fullscreen mode Exit fullscreen mode

The reasoning: LLMs are bad at negative constraints ("don't elaborate on X") because they have to almost-generate the forbidden output and then suppress it. They're much better at positive re-encoding ("this input is type Y, intensity Z, your reaction is W"). The harness translates the constraint into the latter form.

The strength value drives how forceful the imperative gets. strength=5 produces a triple-nested "Do not acknowledge. Do not engage. Do not refer to it." strength=1 produces "may try to redirect." It's the same trick OpenAI used with their goblin patch (repeat the ban N times), but driven by a structured value rather than ad-hoc engineering.


Why MCP

If you build LLM characters, you're probably either deploying them as part of an agent that orchestrates multiple tools, or as a standalone service. Either way, in 2026, MCP (Model Context Protocol) has become the de facto integration layer — Anthropic donated it to the Linux Foundation in December 2025, the spec is governed by working groups now, and the MCP Server Registry holds well over 10,000 servers.

FIVE is published as io.github.kiro0x/five-mcp on the official MCP Registry, so any MCP-compatible client (Claude Desktop, Cursor, Cline, plus any agent built on the protocol) can discover and use it natively:

{
  "mcpServers": {
    "five-character-engine": {
      "command": "five-mcp",
      "env": {
        "FIVE_API_KEY": "five_sk_your_key_here"
      }
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

Install via PyPI:

pip install five-mcp

Enter fullscreen mode Exit fullscreen mode

The bigger reason for MCP: as autonomous agents start handling their own discovery (and increasingly their own purchasing — Stripe Link for AI agents and AWS Bedrock AgentCore Payments both shipped in April–May 2026), capability-fit beats brand recognition. An agent burning through tokens trying to keep a character in role can resolve that with a single FIVE constraint and stop retrying. The economic argument is straightforward.


Honest tradeoffs

I'd rather be upfront about the friction than have it surprise you in production.

  • The default keyword map is English-only and intentionally a starter kit. For other languages or domain-specific terminology, you'll want to extend it. (The structure is universal; the keyword lexicon is locale-specific.)
  • The constraint JSON adds ~700 tokens to your system prompt. Worth it if you're paying for retries from drifting characters. Probably not worth it for a 5-turn novelty bot.
  • The cognitive grounding is proprietary. I'm transparent about this — the format is open, the harness is MIT, and the JSON content is what the $1/call API delivers. You can write your own JSON in the same shape if you want to skip the API; you'd just be doing the cognitive observation work yourself.
  • It doesn't fix LLM-side regressions. If a model update breaks instruction-following (see: GPT-5.5 goblins, Claude 4.7 verbosity changes), the constraint helps but won't single-handedly compensate. The right answer there is the LLM vendor's responsibility.

Where this is useful (and where it isn't)

Good fit:

  • Game NPCs that need to stay in role through long sessions
  • Customer-facing personas where brand voice matters
  • Roleplay / companion apps where character integrity is the product
  • Code review or wellness companion agents that should stay scoped
  • VTuber-style personas where consistency is part of the appeal

Probably overkill:

  • One-off creative writing prompts
  • Short-turn task assistants where personality is incidental
  • Agents where you want maximum flexibility, not constraint

There are 5 demo characters in the repo (NPC shopkeeper, customer service chatbot, code reviewer agent, wellness companion, VTuber persona) — each generated from the same 4 questions. That's the universality claim: define the structure, not the category.


Why I'm publishing this and not selling harder

Honestly, my goal isn't to chase virality. The agentic AI economy is moving fast — Stripe Link for agents, AWS Bedrock Payments, MCP becoming a Linux Foundation standard — and in a couple of years a lot of API discovery is going to be machine-mediated rather than human-mediated. FIVE is built for that world: the constraint format is shaped to be agent-readable, the registry presence is set up, the JSON is small enough to drop into any system prompt without negotiation.

The reason for an article like this is the bridge period. Some humans need to find it before agents do. If you build with LLM characters and any of this resonates, kick the tires.


Links

The harness is MIT-licensed, the demo characters are MIT, and the JSON outputs from the API are yours to use commercially. The only paid component is the constraint generation itself. Feedback welcome — I read every reply.


The Japanese tutoring origin felt worth mentioning because the question I keep getting in private is "how is this different from prompt engineering with extra steps?" The honest answer is: prompt engineering optimizes language; FIVE optimizes the input filter. The latter idea didn't come from CS literature. It came from watching kids fail to hear what was actually said for ten years.