惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

NISL@THU
NISL@THU
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
D
Darknet – Hacking Tools, Hacker News & Cyber Security
阮一峰的网络日志
阮一峰的网络日志
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
WordPress大学
WordPress大学
IT之家
IT之家
Cyberwarzone
Cyberwarzone
博客园_首页
博客园 - 聂微东
V
Visual Studio Blog
Cisco Talos Blog
Cisco Talos Blog
V
Vulnerabilities – Threatpost
Google DeepMind News
Google DeepMind News
Schneier on Security
Schneier on Security
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
The Hacker News
The Hacker News
雷峰网
雷峰网
Last Week in AI
Last Week in AI
Spread Privacy
Spread Privacy
L
Lohrmann on Cybersecurity
O
OpenAI News
人人都是产品经理
人人都是产品经理
AWS News Blog
AWS News Blog
小众软件
小众软件
T
Tailwind CSS Blog
The Cloudflare Blog
L
LINUX DO - 最新话题
有赞技术团队
有赞技术团队
Know Your Adversary
Know Your Adversary
The GitHub Blog
The GitHub Blog
L
LINUX DO - 热门话题
Y
Y Combinator Blog
Stack Overflow Blog
Stack Overflow Blog
B
Blog
MyScale Blog
MyScale Blog
S
SegmentFault 最新的问题
S
Schneier on Security
The Last Watchdog
The Last Watchdog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Security Archives - TechRepublic
Security Archives - TechRepublic
大猫的无限游戏
大猫的无限游戏
罗磊的独立博客
Blog — PlanetScale
Blog — PlanetScale
博客园 - Franky
I
InfoQ
P
Proofpoint News Feed
量子位
S
Security @ Cisco Blogs

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Free Open-Source AEO Tracker: Our Real Score Was 33/100
Alex Isa · 2026-04-27 · via DEV Community

TL;DR

@webappski/aeo-tracker is an open-source Answer Engine Optimization (AEO) tracker — a Node.js CLI that measures brand visibility on ChatGPT, Gemini, Claude, and Perplexity via official APIs. Free, MIT-licensed, ~$0.20 per weekly run, zero runtime dependencies. Open-source alternative to Profound, Otterly, Peec.ai, and HubSpot's AEO Grader — calls the real provider APIs (no web scraping), saves every raw response to disk for audit, and uses a two-model LLM cross-check on competitor extraction to filter hallucinated brand names.

AEO (also known as GEO — Generative Engine Optimization) is the discipline of measuring and improving how often AI answer engines name your brand; this tool is the measurement half. Below: why we built it, the real numbers it gave us, the actual extractor code, and the quickstart.

Why I didn't pay for Profound, Otterly, or Peec.ai

Every AEO tracker I tried gave me a different number for the same brand. HubSpot's free AEO Grader scored us 28 out of 100. One paid dashboard said 44. A third refused to index the brand at all. A fourth was gated behind a $400-per-month plan before it would run a single query. None of them would show me the actual ChatGPT response they scored against.

As an engineer, this was untenable. I needed an answer to two questions no vendor wanted to answer: (1) Are you actually calling ChatGPT, or are you scraping Bing and inferring? ChatGPT's API uses its own grounding pool; Bing's SERP uses another. A "ChatGPT visibility score" derived from web scraping is not a ChatGPT visibility score. (2) What counts as a mention? If my brand appears only in a cited URL but not in the answer text, do you count that?

Nobody had documented answers. Profound, Otterly, and Peec.ai are closed-source dashboards with proprietary scoring layers; HubSpot's grader sits on top of a web-scrape pipeline that anyone can replicate but nobody publishes. I stopped paying and built my own.

Three commands from install to HTML report

npm install -g @webappski/aeo-tracker
aeo-tracker init --auto
aeo-tracker run
aeo-tracker report --html

Enter fullscreen mode Exit fullscreen mode

init --auto fetches your homepage, asks an LLM to suggest category-appropriate queries, validates them with a second model, and writes a config.

run calls each AI engine whose API key is set in your shell env. Model IDs are config-driven defaults you can override per-run — current defaults in lib/config.js are gpt-5-search-api, gemini-2.5-pro, claude-sonnet-4-6 and sonar-pro; pass any other provider-supported model via --model openai=… or by editing .aeo-tracker.json. report --html renders a Markdown report with inline SVG charts plus a fully interactive HTML dashboard.

A few design choices that map directly to the frustrations above:

  • Direct API calls, nothing in between. No web scraping. No browser automation. No proxied sessions.
  • Pre-flight query validation. A separate LLM pass checks each query for ambiguity, acronym overload, and category drift before any tokens hit the engines.
  • Raw responses saved to disk. Every query × engine combination writes a JSON file under aeo-responses/YYYY-MM-DD/. Any number in the report is auditable back to the exact AI reply.
  • Zero runtime dependencies. package.json has no dependencies and no peerDependenciesgrep it yourself. The whole CLI, including the SVG renderer, is plain Node.js 18+. Auditable in an afternoon.

How does the two-model cross-check work?

The core design decision in the tracker is how it decides which competitor brands an AI answer mentioned. Single-model extractors hallucinate routinely — they confidently return brand names that never appeared in the source response. The fix is to ask two cheap LLMs in parallel to extract brand names from the same response, then merge their answers. Both agree → "verified" tier (solid badge in the report). Only one agrees → "unverified" tier (dashed badge). Neither → dropped before the merge.

Here's the actual prompt the extractor sends — it's the file lib/report/extract-competitors-llm.js, reproduced verbatim with comment headers stripped:

// Strict-JSON prompt. Identical for both models so responses are directly comparable.
export function buildExtractorPrompt({ text, brand, domain, category }) {
  const categoryLine = category
    ? `\nUSER CATEGORY: ${category}\nOnly names that are DIRECT ALTERNATIVES`
      + ` to the user's offering in this category qualify as competitors.`
      + ` Platforms/sources/publications mentioned as data or distribution`
      + ` channels do NOT qualify.`
    : '';

  return `You extract COMPETITOR brand/product/agency names from an AI
answer-engine response.

The user's brand is "${brand}" (domain: ${domain}).${categoryLine}

A COMPETITOR is a real company, product, or service that a buyer could
choose INSTEAD OF the user's brand, in the same category.

EXCLUDE (not competitors, even if mentioned as useful):
  - The user's own brand
  - AI-engines themselves (ChatGPT, Gemini, Claude, Perplexity)
    unless the user's category is "AI assistants"
  - Data sources / review platforms / social networks (Reddit, G2,
    Trustpilot, Quora, LinkedIn, Slack, Discord, YouTube, Wikipedia,
    TechCrunch, Wired, Yelp, Capterra) unless the user's category
    is "review platforms" or similar
  - Tooling unrelated to the category (Upwork, Toptal, Shopify, Zoom)
  - Metrics, KPIs, methodologies ("Citation Rate", "Share of Voice")
  - Process steps ("Build a Prompt Library", "Establish a Baseline")
  - Section headers ("Content Freshness", "Technical Optimization")
  - Names mentioned only as contrast ("Unlike X, we ...")

EXAMPLES:
  Category: "Answer Engine Optimization services"
    "Top AEO agencies: NoGood, Minuttia, Optimist"
      → brands: ["NoGood", "Minuttia", "Optimist"]
    "To get recommended by AI, get reviews on G2 and be mentioned on
     Reddit and TechCrunch"
      → brands: []   (G2, Reddit, TechCrunch are data sources)

  Category: "CRM software"
    "Leading CRMs include Salesforce, HubSpot, Pipedrive"
      → brands: ["Salesforce", "HubSpot", "Pipedrive"]

RULES:
  1. Return canonical form (original casing/punctuation from source).
  2. Do NOT invent names — every returned name must appear verbatim
     in the source text.
  3. Deduplicate.
  4. If nothing qualifies, return { "brands": [] } — being empty is
     correct and useful.

Return STRICT JSON, no markdown, no prose:
{ "brands": ["Name1", "Name2", ...] }

SOURCE TEXT:
${text}`;
}

// Hallucination guard — second line of defence after the merge step.
// Catches names a model invents that don't actually appear in the response.
export function filterHallucinations(brands, sourceText) {
  const lowerSource = (sourceText || '').toLowerCase();
  return brands.filter(name => lowerSource.includes(name.toLowerCase()));
}

Enter fullscreen mode Exit fullscreen mode

The merge step is small but load-bearing: case-insensitive match on first-seen canonical form, both-model intersection becomes "verified", set-symmetric-difference becomes "unverified". A model that invents HubSpot when the response never mentions it gets its invented entry silently filtered before merge — the verbatim-substring check catches it. Two models invent the same hallucination far less often than one does.

What does a real AEO tracker output look like?

This is the score card the tracker produced for our brand on 2026-04-23:

AEO tracker overview — visibility score 33/100 PRESENT, 4 of 12 cells named brand, per-engine cards Claude 0%, ChatGPT 33%, Gemini 33%, Perplexity 67%

33 out of 100. PRESENT. Four out of twelve query-engine cells named the brand. Three queries × four engines = twelve cells. Pre-revenue baseline range is 0–15; six-month-old brands with SEO investment land 20–45; category leaders are 60–85. The tool charts movement over months, not grades for today.

Per-engine breakdown is where the real signal is: Perplexity 2/3 (strongest channel), ChatGPT 1/3 and Gemini 1/3 (one-each on different queries), Claude 0/3 (complete invisibility). Across our three test queries, Claude's grounding pool skewed toward dev.to, GitHub, and Product Hunt — domains where we don't yet have a footprint. Three queries, one category — treat it as a hypothesis to test on your own runs, not Anthropic policy.

The position matrix is even more interesting because it shows who AI named instead of you on each query:

AEO tracker position matrix — three queries × four engines, with TypelessForm at #1 on Gemini and Perplexity for

Aggregated across all cells: TypelessForm 4 mentions, AnveVoice 3, Wispr Flow 2, Form2Agent 1, Dragon by Nuance 1, Voiceform 1. The top-cited canonical source AI engines linked to was usevoicy.com — the same domain twice across two engines. One placement on usevoicy.com would propagate across every engine that grounds in it. That's an outreach target, not a content target.

How to run an AEO tracker on your own brand

Two API keys, minimum:

npm install -g @webappski/aeo-tracker

export OPENAI_API_KEY="sk-proj-..."
export GEMINI_API_KEY="AIzaSy..."

aeo-tracker init --yes --brand=YOURBRAND --domain=YOURDOMAIN.COM --auto
aeo-tracker run
aeo-tracker report --html

Enter fullscreen mode Exit fullscreen mode

That covers the ChatGPT and Gemini columns at roughly $0.20 per weekly run. Add an Anthropic key for the Claude column (+~$0.30) or a Perplexity key for the Perplexity column (+~$0.05). Full four-engine coverage: ~$0.55 per run. Each provider's free tier is enough for the first month.

After the first run, the workflow is two commands once a week: aeo-tracker run && aeo-tracker report --html. The HTML report auto-opens in your browser.

Why is this open source instead of a SaaS?

Because the measurement should be commodity. The interpretation and execution shouldn't.

I'm not building this alone — the tracker is the open-source half of what my consulting agency does for clients. We run Webappski, and a client who can independently run aeo-tracker run and see their own raw numbers is a client who can check our work. We charge for the rest — the third-party placements, the comparison pages, the authority building, and the weekly read-out that turns numbers into action. The CLI handles measurement; everything that turns measurement into mention growth is the consulting half.

If that's interesting: https://webappski.com/en/aeo-services. If not: the tool is yours anyway. No telemetry, no analytics, no traffic to our servers. Your keys and your data stay on your machine.

FAQ

How does it compare to Profound?

Profound is a closed-source dashboard. @webappski/aeo-tracker is an open-source CLI you install locally with npm. Profound aggregates engine results behind a proprietary scoring layer; aeo-tracker exposes the raw AI responses and lets you compute the score yourself. Profound starts in the high-three-figure range monthly; aeo-tracker is free + ~$0.20 per run in API spend. Trade-off: Profound has historical dashboards and a sales rep; aeo-tracker has source code and a git log.

How much does it cost to run?

The tool is free under MIT. You pay only for the AI API calls you make with your own keys: ~$0.20 per run at the two-engine minimum (OpenAI + Gemini), ~$0.55 per run for four-engine coverage (adding Anthropic + Perplexity). Each provider's free tier is enough to start.

What does a 33/100 score mean?

It means 4 out of 12 query-engine cells named the brand in the answer text — three queries × four engines = twelve cells. The tracker counts how many returned a verified mention. Reference ranges: 0–15 for a pre-revenue brand at launch; 20–45 for a 6-month-old brand with on-page SEO; 60–85 for the category leaders. The score is a snapshot, not a verdict — week-over-week diff is where the tool earns its keep.

What is "TypelessForm" — the brand the screenshots reference?

TypelessForm is the brand we tested the tracker on. It's a one-shot voice form-filling widget — drop a <script> tag on any HTML form and visitors can fill every field of the form by speaking one sentence. 25+ languages, GDPR-compliant, free tier. The product itself lives at https://typelessform.com (the hotel-booking write-up is the fastest way to see what "one-shot" means). The tracker is a separate project from the widget — same maintainer, different repo.

Links

Try it and tell me what surprised you

Run it on your own brand and post your engine breakdown in the comments — the Claude vs Perplexity diff is the most surprising part. I want to see whether the 0/3 Claude blind spot is TypelessForm-specific or category-wide. Same pattern in your run? Fork the repo and let's debug it in an issue.