惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
博客园 - 三生石上(FineUI控件)
Martin Fowler
Martin Fowler
WordPress大学
WordPress大学
D
Docker
S
SegmentFault 最新的问题
博客园 - 聂微东
美团技术团队
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Last Week in AI
Last Week in AI
M
MIT News - Artificial intelligence
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
GbyAI
GbyAI
L
LangChain Blog
Vercel News
Vercel News
博客园 - 叶小钗
MongoDB | Blog
MongoDB | Blog
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
T
Threat Research - Cisco Blogs
T
Threatpost
Scott Helme
Scott Helme
T
Tailwind CSS Blog
Latest news
Latest news
Stack Overflow Blog
Stack Overflow Blog
Blog — PlanetScale
Blog — PlanetScale
The Register - Security
The Register - Security
罗磊的独立博客
P
Proofpoint News Feed
腾讯CDC
S
Schneier on Security
雷峰网
雷峰网
A
About on SuperTechFans
T
Tenable Blog
F
Full Disclosure
Cyberwarzone
Cyberwarzone
博客园_首页
有赞技术团队
有赞技术团队
K
Kaspersky official blog

DEV Community

How I built a dependency risk scanner with Coral in 7 days Local-first: a Model on Your Own Machine, Zero Cloud 2487. Remove Nodes From Linked List C_STD : A Leak-Free, Cross-Platform Standard Library for Modern C How to build your professional network as a developer — authentic strategies The Pope and the Dynamo The Reputation Layer: Why Developers Quietly Run Corporate PR The Last Mile of Software Is a Sentence AppView 1.0.0 Released: Instrument and Secure Your LLM Deployments The Hermes Rescue: How an Open Agent Rebuilt My GitHub Projects from Scratch S2 — Heap Corruption Crashes: How to Diagnose and Fix Them I built a Chrome extension because I couldn't stop opening Twitter between Pomodoro sessions AI cheating in technical interviews is invisible to interviewers — here's how we detect it Lean4 Might Be the Missing Piece in AI: Why Theorem Provers Are Suddenly Everywhere The Zero-Drift API Series: Stop Trusting a Green Build You Can't Explain How I Deployed My First Project on AWS (And Didn't Break Everything) How I Built a Real-Time Quiz Platform with Next.js, WebSockets, and Learning Science When Your VPS Blocks Outbound SMTP: What Actually Helps Los agentes de código necesitan memoria durable, no solo contexto Cognitive Architectures of AGI: 7 Patterns That Transform LLMs from Oracles into Thinkers I Built a Chat App That Deletes Itself (Because I Was Bored at 2am) Uncovering the Power of Linux's History Command How to Add a Contact Form to Your Ghost Blog Accept Payments in Minutes with Afriex Checkout Sessions Hermes Agent Gets Smarter Every Day. So Does the Bill. How I get Next.js sites to load almost instantly — a practical checklist Treasure Hunt Engine: Why One Bad Prometheus Rule Sank the Whole Veltrix Event Test a DNS Leak in 2 Minutes: Complete Methodology + Per-OS Fixes (2026) Lessons from building a Chrome extension Rivet: A library i made in 2 days I Built a Speech-to-Text Tool Because Sometimes Typing Just Gets in the Way How I'm Building a Multi-Agent Crew for AI Coding Supervision (Cipher Update) Your AI Agent Needs a Manager, Not a Superhero I Built CausalLens — A Free, Open-Source Causal Impact Calculator for Time Series (5 Methods, Zero Setup) How to write good commit messages and pull requests — a team guide Cipher: The Jarvis with a Hermes Core How to build a second brain with Obsidian and Claude Code (step by step) Claude completed my MPI assignment. Then it couldn't run it. So I built the missing piece. This 100% How Our Document Ingestion Pipeline Turns Files into LLM-Ready Markdown Agentic AI Model Risk Management: Aligning with Regulatory Expectations CTV Fraud Has an IPv6 Business Problem The great AI enshittification The Veltrix Treasure Hunt Engine: Why Our First Rewrite Cost Us 3.2 Million Requests Per Second I Made My AI Models Argue, Then Let Hermes Be the Judge Road To KiwiEngine #4: The Racecar Driver Analogy Run Aider on Ollama, Bedrock, or Any LLM Provider — One Gateway, Every Model BAIXAR VÍDEO DO YOUTUBE Releasing HeliosProxy, The programmable Postgres data-plane Hello, DEV Community! 👋 Three Bitcoin Primitives That Don't Exist Anywhere Else (PoW Beacon, DLC Oracle, Fair-Launch Rune) Append-only doesn't mean what you'd hope Notes from the Mistral AI Now Summit Are Claude skills safe in 2026? What the Snyk ToxicSkills audit actually found How to not Lose $500M via API Bills: Run Private AI for 100 Engineers Under $1 Million The Unlikely Journey from Bricks to Bytes Three TODOs, three weeks, one weekend: finishing pq v0.14 Server-Side WebRTC Noise Reduction with Pion, FFmpeg, and RNN Models Autonomous AI Agents in Cryptocurrency Portfolio Management IDOR BugBounty Labs: 5 Realistic Challenges to Master Insecure Direct Object Reference IDOR Lab: The Bug Bounty Training Platform That Doesn't Hold Your Hand ZentriqGuard — Hermes Agent-Powered Zero-Trust Access Auditor Why Artistic QR Codes Silently Fail (And How I'm Trying to Fix It) How I Built and Monetized a Currency Exchange Rate API with FastAPI, Deployed it on Render, and Published it on RapidAPI. The 7 Best Reddit Scrapers in 2026 (Free & Paid, Tested) An AI runs my company. A solo dev vibe-coded $15K in a week — we made $[X]. A cold autopsy. I am new here Stop Pasting Your Code Into ChatGPT For Debugging—Run LLMs Locally Instead 5 Free JSON Tools Every Developer Should Bookmark Building reqlog: a Go CLI for tracing request flows across logs (files, Docker, SSH) Environment Variables in Node.js — What They Are, How dotenv Works, and Why Getting This Wrong Can Ruin You I Built a Zero-Dependency Discord.js Package That Creates Temporary Voice Channels Automatically Goodbye CSV Nightmares: Automating Magento Order Line Item Exports in Google Sheets Nexthena — A Local-First Whiteboard App Built on Excalidraw How we built an platform to solve the "finding a photographer" problem 5 Failure Modes I Found in My Financial RAG (And the One That Actually Mattered) From Logic to Numbers: A Beginner’s Guide to Programming Through Mathematical Thinking Oracle Fusion Report Scheduling with Skip Conditions AtCoder Beginner Contest 460 参加記録と解答例 (A D問題) Your AI Agent Just Crashed at Step 9 of 12. Here's How to Make That Not Matter. Grokking the System Design Interview: Why the Original Course Still Wins Outbox Pattern Solves Publishing. Inbox Pattern Solves Processing. Why autism hasn't disappeared — a hypothesis Por que eu parei de usar Cloudinary e construí minha própria API de imagens How to Test if Your Proxy is Leaking DNS: 2026 Setup Guide AWS VPC Networking — Public Subnet, Private Subnet ve 3-Tier Mimari MediaNote: a note-taking app inside VS code I built a sovereign self-healing AI development system from scratch using Hyperdimensional Computing — no LLMs, no cloud, no APIs WordPress vs. Next.js: benchmark real pe Core Web Vitals (și de ce plugin-urile de cache nu rezolvă problema) ai, deepseek, machinelearning I Gave My Dead Raspberry Pi to an AI Agent. It Fixed Everything Over SSH. How I Built a Google Shopping Scraper with Python & Playwright I Turned Hermes Agent into a Verifiable Agent Operating System The 5 Systematic Failure Modes of AI Research Reports (and How to Catch Them) Stop Saying 'Great!'—Build a Real AI Interview Coach with Claude Code Simple SQL Tool What is DevOps? A Plain English Guide for Beginners Why ChatGPT sucks at generating Types (and how I fixed it) Modelling a codebase as a requirements ontology in Neo4j, keeping AI coding agents oriented AI Is Doing the Work of Junior Developers — And Nobody Is Talking About What Happens in 7 Years
Building ShouldWeAutomate: A Decision Intelligence Platform for Workflow Automation
Harish Kotra (he/him) · 2026-05-31 · via DEV Community

How we built an open-source platform that tells you whether your business process is ready for AI automation — with deterministic scoring, gamified UX, and optional LLM inference.


The Problem

Every week, someone asks: "Can we automate this workflow?" The answer is never simple. It depends on data quality, process stability, regulatory exposure, exception rates, integration readiness, decision complexity, and ROI potential — seven dimensions that interact in non-obvious ways.

Most automation decisions are made on gut feel. Teams spend months building automation only to discover the process changes too frequently, the data is too messy, or the compliance team blocks it.

We wanted to build a tool that makes this evaluation systematic, data-driven, and interactive — something a team can open in a browser, describe their workflow, and get a defensible answer in seconds.

The Architecture

Frontend

Single-page Flask application rendered server-side with Jinja2 templates. The frontend is vanilla JavaScript with Chart.js for the radar visualization and a custom SVG gauge for the overall score.

Key design decisions:

  • No build step. No webpack, no React, no npm. Pure HTML/CSS/JS. Zero friction for contributors.
  • Gamified sliders. Instead of 35 individual range inputs (5 questions × 7 dimensions), we show 7 aggregate dimension sliders with tier badges — Critical → Bronze → Silver → Gold → Mythic. Click "Fine-tune" to expand the 5 sub-questions.
  • Live preview. A mini gauge and recommendation badge update in real-time as sliders move. Users see their score change before they click "Analyze."
// Core rendering — dimension cards with aggregate + fine-tune
function createDimSection(key, dim, prefix) {
  const aggDefault = Math.round(
    dim.questions.reduce((s, q) => s + q.default, 0) / dim.questions.length
  );
  const tier = getTier(aggDefault);
  // ... builds the HTML with aggregate slider + expandable sub-sliders
}

// Live preview — recompute overall on every slider change
function updateLivePreview() {
  const weights = [0.20, 0.20, 0.15, 0.15, 0.10, 0.10, 0.10];
  dimKeys.forEach((key, i) => total += getAggregateValue(key) * weights[i]);
  // Update gauge SVG dashoffset, tier badge, recommendation text
}

Enter fullscreen mode Exit fullscreen mode

Backend

Flask acts as both the web server and the decision engine. The architecture follows a modular design:

engine/
├── scorer.py          # Dimension scoring logic, defaults, recommendations
├── analyzer.py        # Orchestrator — ties all modules together
├── explainer.py       # Score breakdown with pull-up/pull-down analysis
├── what_if.py         # What-if simulation and sensitivity analysis
├── roi_calculator.py  # Quantitative ROI with NPV, payback, FTE impact
├── remediation.py     # Remediation playbooks per dimension
├── regulations.py     # Regulatory framework mapping (HIPAA, GDPR, SOX, etc.)
├── similarity.py      # Benchmark similarity search
├── sub_process.py     # Multi-process decomposition and aggregation
└── llm.py             # OpenAI-compatible LLM gateway

Enter fullscreen mode Exit fullscreen mode

The Scoring Engine

The core scoring logic in scorer.py defines 7 dimensions, each with 5 weighted sub-questions:

SCORING_DEFAULTS = {
    "data_quality": {
        "label": "Data Quality",
        "weight": 0.20,
        "questions": [
            {"id": "data_completeness", "text": "How complete is your data?", ...},
            {"id": "data_consistency", "text": "How consistent is data format?", ...},
            # ... 5 questions per dimension
        ],
    },
    # ... 6 more dimensions
}

Enter fullscreen mode Exit fullscreen mode

The overall score is a weighted average. The recommendation tier is determined by thresholds inspired by Capability Maturity Model (CMM) levels:

def get_recommendation(overall_score):
    if overall_score < 30:
        return {"level": "DO NOT AUTOMATE", ...}
    elif overall_score < 50:
        return {"level": "IMPROVE PROCESS FIRST", ...}
    elif overall_score < 70:
        return {"level": "HUMAN-IN-THE-LOOP AI", ...}
    elif overall_score < 85:
        return {"level": "AI ASSISTED AUTOMATION", ...}
    else:
        return {"level": "AGENT AUTOMATION READY", ...}

Enter fullscreen mode Exit fullscreen mode

AI Integration

The LLM integration in engine/llm.py is optional and modular. It follows the OpenAI chat completions format, making it compatible with LM Studio, Ollama, OpenAI, Anthropic, or any other provider.

When enabled, the AI performs three tasks:

  1. Score inference — given a workflow description, infer preliminary dimension scores
  2. Contextual risk analysis — generate specific failure modes tied to the actual workflow context
  3. Executive summary — produce a CTO-ready summary with key findings and recommendations
def infer_workflow(description, industry):
    user_prompt = f"Industry: {industry}\n\nWorkflow Description:\n{description}"
    result = _call_llm(SYSTEM_WORKFLOW_ANALYSIS, user_prompt)
    if result and "dimension_scores" in result:
        # Clamp scores to 0-100 and return
        scores = {k: max(0, min(100, int(v))) for k, v in result["dimension_scores"].items()}
        return result
    return None

Enter fullscreen mode Exit fullscreen mode

The system prompt instructs the LLM to be skeptical and default to moderate scores unless the description strongly suggests otherwise — preventing over-optimistic AI outputs.

Benchmark Dataset

The data/benchmark_generator.py creates 600+ synthetic workflows across 10 industries with deliberately injected failure modes:

FAILURE_PROFILES = {
    "contradictory_rules": "Business rules are contradictory across departments",
    "broken_apis": "Legacy systems have no stable API endpoints",
    "regulatory_churn": "Regulations change quarterly, invalidating logic",
    "data_rot": "Historical data uses outdated schemas",
    "seasonal_spikes": "Volume varies 10x between peak and off-peak",
    "fraud_scenarios": "Fraud patterns evolve faster than detection rules",
    # ... more failure modes
}

Enter fullscreen mode Exit fullscreen mode

Each workflow gets randomized dimension scores, a metadata profile, and injected failure modes. The result is a realistic benchmark for similarity matching — when a user analyzes their workflow, we find the 5 most similar synthetic workflows.

The Gamification Layer

The original UI had 35 range sliders visible at once. Users found it overwhelming. We redesigned it with three principles:

  1. Progressive disclosure. Show 7 aggregate sliders. "Fine-tune" expands to the full 35.
  2. Instant feedback. Every slider move updates the gauge, tier badge, and recommendation preview.
  3. Tier badges. Each dimension gets a fun label: 🥉 Bronze, 🥈 Silver, 🥇 Gold, 🏆 Mythic.
function getTier(score) {
  if (score >= 85) return { text: "Mythic", icon: "🏆", cls: "tier-excellent" };
  if (score >= 70) return { text: "Gold",   icon: "🥇", cls: "tier-good" };
  if (score >= 50) return { text: "Silver", icon: "🥈", cls: "tier-moderate" };
  if (score >= 30) return { text: "Bronze", icon: "🥉", cls: "tier-poor" };
  return { text: "Critical", icon: "", cls: "tier-critical" };
}

Enter fullscreen mode Exit fullscreen mode

AI auto-fill is now the default path. Users describe their workflow in a textarea, click "Auto-fill Scores," and the AI pre-fills all 35 sub-scores. Users can then fine-tune before analyzing.

The Results Dashboard

After analysis, users get a comprehensive dashboard with seven tabs:

Tab Content
Overview Gauge, radar chart, risks, red flags, failure mode analysis, ROI, benchmark comparison, next steps
Explanation Per-dimension breakdown with pull-up/pull-down factors and improvement tips
What-If Sensitivity analysis + preset scenarios + custom sliders
Remediation Phased action plans per dimension with effort estimates
Regulatory Applicable regulations with governance penalties and audit requirements
AI Summary Executive summary generated by LLM (when enabled)

What We Learned

  1. Deterministic engines are underrated. The LLM is a nice-to-have, but the deterministic scoring engine handles 90% of use cases. It's fast (~1 second), predictable, and doesn't require users to set up external services.

  2. Gamification reduces friction. Users engaged more with tier badges and live preview than with a static form. The instant feedback loop makes the evaluation feel like a game rather than a survey.

  3. AI prefill is a trust cliff. When AI prefills scores, users trust it more if they can see and tweak every value. The fine-tune section is critical for building confidence.

  4. Synthetic benchmarks are surprisingly useful. Even though they're generated, they provide a reference frame. Users want to know how their scores compare to "similar" workflows.

Getting Started

git clone https://github.com/harishkotra/ShouldWeAutomate.git
cd ShouldWeAutomate
pip install -r requirements.txt
python app.py

Enter fullscreen mode Exit fullscreen mode

How it works

Code & more: https://www.dailybuild.xyz/project/148-should-we-automate