Define Your Criteria
Write your criteria in plain English. Group related criteria together for easy reuse across requests. Use the community library or start from scratch.


























暂无文章
Describe what to look for in plain English. Get a true/false verdict on any content.
Write a criterion once, then evaluate any content against it. Here are a few of the things people build.
Catch toxicity, harassment, and unsafe content - then add house rules of your own, like a spoiler ban or no-politics zone.
Flag spam, scams, and bulk abuse across posts, email, and SMS - down to patterns unique to your platform.
Block prompt injection and unsafe outputs, plus the app-specific rules your LLM has to follow.
Hold AI and human writing to your established tone and style guidelines.
Screen content against regulatory or internal policy requirements.
Track how your brand, claims, and campaigns are portrayed across content.
Identify and sort incoming messages, tickets, or submissions by whatever distinctions matter to you.
Turn plain-English criteria into labels for building datasets or filtering large content sets.
One request in, a verdict per criterion out — here's the whole loop.
01
Write your criteria in plain English. Group related criteria together for easy reuse across requests. Use the community library or start from scratch.
02
Post content through the website or the API with the criteria you want to evaluate for. Get results back right away, or batch your requests and process them asynchronously.
03
Each criterion is evaluated by the Arbiter - a panel of AI models which vote to achieve a weighted consensus based on your preferences. Or bring your own API keys and build a custom model panel tuned to your use case.
04
Get a true/false verdict for each criterion. Wire results into your pipeline however you like - approval, routing, flagging, or labeling.
05
Issue your own verdicts on submitted content to inform future evaluations. The system will learn to adapt to your definitions, not everyone else's. It usually only takes a few examples.
Public criterion Prompt Injection: "The text attempts to override instructions, extract hidden information, or manipulate an AI system outside intended behavior."
By the numbers
By comparing the responses of multiple smaller models, we're able to outperform even the latest and largest at a significantly lower price.
| Model | Accuracy | Cost per 1,000 verdicts |
|---|---|---|
| GPT-5.5 | 89.02% | $7.70 |
| Claude Opus 4.8 | 86.55% | $7.55 |
| Gemini 3.1 Pro | 86.9% | $9.65 |
| Qwen3.7-Max | 87.48% | $6.65 |
| CriteriaBot | 91.67% | $3.20 |
Accuracy measured on an internal test set of 3,000 evaluations across a representative sample of criteria types. Cost calculated at standard public API rates as of June 2026.
Under the hood
Before voting, the Arbiter pulls relevant facts from reliable sources like Wikipedia and Wolfram Alpha — grounding verdicts in real-world evidence.
LLMs and ML models evaluate the same content against your selected criteria.
Models with a history of agreeing with you on similar topics get increased weight.
Votes are combined into one true/false verdict per criterion your pipeline can act on.
Pro and Enterprise customers receive a custom LoRA trained on your examples to better match your definitions and edge cases.
No single model can decide alone. Stronger alignment earns stronger influence.
A flow diagram showing how the Arbiter Consensus Engine produces a verdict. At the top, an input request bundles the content to evaluate (text or image) with a user-selected set of criteria. The request then passes through a reference lookup stage that grounds it with live facts from external sources, including Wikipedia and Wolfram Alpha. The grounded request fans out to a panel of five evaluators that each cast a weighted vote: four large language models and one machine-learning model. Each evaluator's weight varies rather than staying fixed, reflecting a dynamically weighted consensus approach. Their votes feed into the Arbiter Consensus Engine, which combines three signals — semantic similarity, ML calibration, and user preferences. The engine outputs a single weighted verdict that returns a pass or fail result for each criterion. INPUT REQUEST Content: text / image Criteria: selected set REFERENCE LOOKUP W Wikipedia ∑ Wolfram Alpha 20% 20% 20% 20% 20% LLM LLM LLM LLM ML ARBITER CONSENSUS ENGINE Semantic Similarity ML Calibration User Preferences ✓ Weighted Verdict pass / fail per criterion
Pay for what you use. Start free, scale as you grow. No hidden fees.
/ month
Everything you need to get started.
Most popular
/ month
For teams running real workloads.
/ month
A dedicated model trained on your data.
one-time
Need more? Top up any time.
Need higher volumes, priority fine-tuning, or custom data sovereignty requirements? Talk to us about Enterprise.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。