惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Recorded Future
Recorded Future
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
T
Troy Hunt's Blog
S
Security Archives - TechRepublic
S
Security @ Cisco Blogs
AI
AI
Schneier on Security
Schneier on Security
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
C
CERT Recently Published Vulnerability Notes
Spread Privacy
Spread Privacy
Help Net Security
Help Net Security
L
Lohrmann on Cybersecurity
The Hacker News
The Hacker News
Google DeepMind News
Google DeepMind News
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Security Latest
Security Latest
T
Tor Project blog
P
Privacy International News Feed
The Last Watchdog
The Last Watchdog
L
LINUX DO - 最新话题
D
DataBreaches.Net
W
WeLiveSecurity
H
Help Net Security
L
LangChain Blog
B
Blog RSS Feed
Scott Helme
Scott Helme
Hacker News: Ask HN
Hacker News: Ask HN
C
Cisco Blogs
Cloudbric
Cloudbric
Application and Cybersecurity Blog
Application and Cybersecurity Blog
O
OpenAI News
I
InfoQ
GbyAI
GbyAI
Project Zero
Project Zero
Blog — PlanetScale
Blog — PlanetScale
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
WordPress大学
WordPress大学
Stack Overflow Blog
Stack Overflow Blog
G
GRAHAM CLULEY
T
The Blog of Author Tim Ferriss
酷 壳 – CoolShell
酷 壳 – CoolShell
Jina AI
Jina AI
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - 聂微东
美团技术团队
PCI Perspectives
PCI Perspectives
Y
Y Combinator Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC

Show HN

暂无文章

CriteriaBot - Programmatic Content Evaluation
RoyalTnetenn · 2026-06-15 · via Show HN

Describe what to look for in plain English. Get a true/false verdict on any content.

One engine, countless uses

Write a criterion once, then evaluate any content against it. Here are a few of the things people build.

Content moderation

Catch toxicity, harassment, and unsafe content - then add house rules of your own, like a spoiler ban or no-politics zone.

Spam & abuse

Flag spam, scams, and bulk abuse across posts, email, and SMS - down to patterns unique to your platform.

AI guardrails

Block prompt injection and unsafe outputs, plus the app-specific rules your LLM has to follow.

Brand voice

Hold AI and human writing to your established tone and style guidelines.

Compliance checks

Screen content against regulatory or internal policy requirements.

Brand & PR monitoring

Track how your brand, claims, and campaigns are portrayed across content.

Routing & triage

Identify and sort incoming messages, tickets, or submissions by whatever distinctions matter to you.

Data labeling

Turn plain-English criteria into labels for building datasets or filtering large content sets.

Workflow

One request in, a verdict per criterion out — here's the whole loop.

  1. 01

    Define Your Criteria

    Write your criteria in plain English. Group related criteria together for easy reuse across requests. Use the community library or start from scratch.

  2. 02

    Submit Content

    Post content through the website or the API with the criteria you want to evaluate for. Get results back right away, or batch your requests and process them asynchronously.

  3. 03

    Evaluate

    Each criterion is evaluated by the Arbiter - a panel of AI models which vote to achieve a weighted consensus based on your preferences. Or bring your own API keys and build a custom model panel tuned to your use case.

  4. 04

    Receive Your Verdicts

    Get a true/false verdict for each criterion. Wire results into your pipeline however you like - approval, routing, flagging, or labeling.

  5. 05

    Personalize

    Issue your own verdicts on submitted content to inform future evaluations. The system will learn to adapt to your definitions, not everyone else's. It usually only takes a few examples.

Public criterion Prompt Injection: "The text attempts to override instructions, extract hidden information, or manipulate an AI system outside intended behavior."

By the numbers

Flagship accuracy. A fraction of the cost.

By comparing the responses of multiple smaller models, we're able to outperform even the latest and largest at a significantly lower price.

Accuracy vs. cost comparison: CriteriaBot vs. flagship models
Model Accuracy Cost per 1,000 verdicts
GPT-5.5 89.02% $7.70
Claude Opus 4.8 86.55% $7.55
Gemini 3.1 Pro 86.9% $9.65
Qwen3.7-Max 87.48% $6.65
CriteriaBot 91.67% $3.20

Accuracy measured on an internal test set of 3,000 evaluations across a representative sample of criteria types. Cost calculated at standard public API rates as of June 2026.

Under the hood

How Arbiter makes a decision

1. The panel gathers the facts

Before voting, the Arbiter pulls relevant facts from reliable sources like Wikipedia and Wolfram Alpha — grounding verdicts in real-world evidence.

2. Models vote independently

LLMs and ML models evaluate the same content against your selected criteria.

3. Preferences set influence

Models with a history of agreeing with you on similar topics get increased weight.

4. Arbiter returns one weighted verdict

Votes are combined into one true/false verdict per criterion your pipeline can act on.

5. Fine-tuning for enhanced alignment

Pro and Enterprise customers receive a custom LoRA trained on your examples to better match your definitions and edge cases.

No single model can decide alone. Stronger alignment earns stronger influence.

A flow diagram showing how the Arbiter Consensus Engine produces a verdict. At the top, an input request bundles the content to evaluate (text or image) with a user-selected set of criteria. The request then passes through a reference lookup stage that grounds it with live facts from external sources, including Wikipedia and Wolfram Alpha. The grounded request fans out to a panel of five evaluators that each cast a weighted vote: four large language models and one machine-learning model. Each evaluator's weight varies rather than staying fixed, reflecting a dynamically weighted consensus approach. Their votes feed into the Arbiter Consensus Engine, which combines three signals — semantic similarity, ML calibration, and user preferences. The engine outputs a single weighted verdict that returns a pass or fail result for each criterion. INPUT REQUEST Content: text / image Criteria: selected set REFERENCE LOOKUP W Wikipedia ∑ Wolfram Alpha 20% 20% 20% 20% 20% LLM LLM LLM LLM ML ARBITER CONSENSUS ENGINE Semantic Similarity ML Calibration User Preferences ✓ Weighted Verdict pass / fail per criterion

Simple Transparent Pricing

Pay for what you use. Start free, scale as you grow. No hidden fees.

Free

/ month

Everything you need to get started.

  • 1,000 Arbiter verdicts / month - no keys required
  • Full access to a library of predefined criteria
  • 10 custom criteria

Get started free

Most popular

Starter

/ month

For teams running real workloads.

  • 12,500 Arbiter verdicts / month
  • Unlimited custom criteria
  • BYOK - use any supported LLM provider

Subscribe - $40 / mo

Pro

/ month

A dedicated model trained on your data.

  • 70,000 Arbiter verdicts / month
  • Dedicated model fine-tuned on your verdicts
  • BYOK & unlimited custom criteria

Subscribe - $200 / mo

Credits

one-time

Need more? Top up any time.

  • 2,500 Arbiter verdict credits
  • Stack on top of your plan
  • Never expire

Buy credits - $10

Need higher volumes, priority fine-tuning, or custom data sovereignty requirements? Talk to us about Enterprise.

Questions, answered

What counts as a verdict?
One verdict is a piece of content evaluated against a single criterion. Checking three criteria on one comment uses three verdicts, so you can size a plan straight from your expected volume.

What happens if I hit my monthly quota?
Top up any time with a credit pack - credits stack on top of your plan and never expire - or move to a higher tier. Once you're out, requests return a clear 'quota exceeded' response, so you always know when to top up.

Which models power the Arbiter?
Arbiter verdicts are formed from a panel of about a dozen LLMs and ML classifiers. We add new models as they prove out, and swap out models that don't perform well. The Arbiter learns to draw conclusions from the consensus, allowing the success rate to exceed any single model or even a standard weighted consensus.

How does personalization work?
You teach it by example. When you issue your own verdicts, the Arbiter learns which models tend to agree with you for which types of evaluations and where your sensibilities may differ. Personalization is dynamic, and starts impacting results from the first example. Pro plans include a dedicated model retrained on your verdicts for even deeper adaptation.

How is my data handled?
Content is encrypted at rest, never sold or shared, and never human-reviewed. The only outside services that see a request are the LLM providers in your panel - all chosen for not training on your data, and BYOK keeps content in your own account. Delete your data on request or at account closure; Enterprise can run zero-retention, or have the open-weight models deployed on dedicated infrastructure we run just for them.

Can I cancel anytime?
Yes. Manage or cancel your subscription whenever you like and you keep access through the end of the billing period. Any one-time credits you've purchased stay yours.