惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
WordPress大学
WordPress大学
Google DeepMind News
Google DeepMind News
T
The Exploit Database - CXSecurity.com
阮一峰的网络日志
阮一峰的网络日志
F
Fox-IT International blog
The GitHub Blog
The GitHub Blog
Engineering at Meta
Engineering at Meta
I
Intezer
P
Privacy & Cybersecurity Law Blog
B
Blog RSS Feed
Latest news
Latest news
小众软件
小众软件
A
Arctic Wolf
Attack and Defense Labs
Attack and Defense Labs
L
LINUX DO - 热门话题
博客园 - 聂微东
B
Blog
T
Troy Hunt's Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
Malwarebytes
Malwarebytes
爱范儿
爱范儿
Recorded Future
Recorded Future
Apple Machine Learning Research
Apple Machine Learning Research
人人都是产品经理
人人都是产品经理
D
Docker
T
Threat Research - Cisco Blogs
MyScale Blog
MyScale Blog
Martin Fowler
Martin Fowler
E
Exploit-DB.com RSS Feed
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
PCI Perspectives
PCI Perspectives
Scott Helme
Scott Helme
N
Netflix TechBlog - Medium
博客园 - 三生石上(FineUI控件)
T
True Tiger Recordings
C
Check Point Blog
Microsoft Azure Blog
Microsoft Azure Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
K
Kaspersky official blog
Security Latest
Security Latest
The Hacker News
The Hacker News
Microsoft Security Blog
Microsoft Security Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
Stack Overflow Blog
Stack Overflow Blog
S
Security @ Cisco Blogs
C
CXSECURITY Database RSS Feed - CXSecurity.com
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
M
Microsoft Research Blog - Microsoft Research

DEV Community

Chat With Your Raspberry Pi — Control GPIO, Read Sensors, and Manage Services via Telegram Using Garudust Run OpenAI Codex CLI on Claude, Gemini, or Llama — in 50 lines of C# Token economics for AI agents: why workflow ownership matters more than task automation Why SMS Codes Are No Longer Enough for Business Security Communicate Ideas Visually: Let AI Run the Feedback Loop Building an Autonomous AI Hiring Agent with Multi-Agent Runtime Orchestration 🚀 Validating lists in Okyline: uniqueness, order, and cross-element rules Base64 encoding visualizer I Built a Browser Game Engine Inside WordPress Without Canvas or WebGL. Here's Why Forget Usernames and Passwords: A Web2 Developer’s Guide to Solana Identity Usage-Based Billing for AI Agents with FastAPI and Kong 30 Days of AI Agents Buying From a Real WooCommerce Store. Here's What the Data Says. AWS - Identity and Access Management Explained for Beginners Token Saving, and Caveman How Superpowers Forces Skill Execution How I Stressed My SQLite Job Queue to 5,000 Continuous Tasks on an Android Phone (And Why It Outperformed the Cloud) Is the job market dead, or has the skill bar increased? Introducing PlanCollab: AI-Powered Cross-Agent Code Planning & Review No More Waiting in Line: How I Built a Web-Based Canteen Queue Management System with Flask and MongoDB Deploying Unbound Validating DNS Resolver on Ubuntu 24.04 Deploying Prometheus Metrics Collection Server on Ubuntu 24.04 AWS IAM Roles Anywhere Hands-On Deploying Grafana Metrics Visualization Platform on Ubuntu 24.04 Deploying Gogs Simple Git Hosting on Ubuntu 24.04 Deploying MongoDB NoSQL Document Database on Ubuntu 24.04 Deploying Passbolt Team Password Manager on Ubuntu 24.04 Deploying OpenWebUI Local AI Interface on Ubuntu 24.04 Deploying Bitwarden Password Management Vault on Ubuntu 24.04 Deploying GitLab CE DevOps Management Suite on Ubuntu 24.04 Panduan Praktis Pasca-Install Ubuntu 24.04 Desktop Agar Sistem Nyaman Dipakai Harian Deploying n8n Workflow Automation Engine on Ubuntu 24.04 Memory Cache: o bug invisível que só aparece quando sua aplicação precisa escalar horizontalmente "this" in JS is SIMPLE as a rock LoRaWAN has ~51 bytes per frame. Your JSON alert doesn't fit. Stop Avoiding Bitwise Operators ERP Product Tree Denormalization: The Maintenance and Scale Conundrum We Leaked 1,368 Customers into Our LIVE Stripe Account via E2E Tests Overlay Widgets vs Real WCAG Scanners: A 2026 Buyer’s Guide How an Accessibility SaaS Broke Its Own Landing (and How We Fixed It) Building the harness around our coding agents: eight failure modes, eight pillars LynxDB - I wanted Splunk's query language without Splunk RAG Is Not Always the Answer Anymore: How AI Agents Search Code in 2026 I Leaked API Keys Through My .env File — Here's What I Learned About Secret Management Score Big with Power Apps: A Step-by-Step Guide to Custom Football APIs IaC Drift Is Inevitable — Design for Detection, Not Prevention I Built a CLI Tool That Writes Better Git Commits Than I Do Adding Text Selection to Bash I Built an Android App With Zero Backend — Here's What Happened I built toklock — the only Anthropic rate-limit proxy that queues requests instead of crashing your agents The Hardest Part of Building an Encrypted Journaling App Wasn’t Encryption Replicate MySQL to ClickHouse with Sling Why I Think the Next Big Blockchains Will Be Built Around AI, Not With AI on Top How to use the Specification Pattern to Clean Up Query Logic in C#, .NET AI may already be turning translators into proofreaders. Coders could be next? One API, every social image - dynamic OG, Twitter, LinkedIn, Pinterest, YouTube AI Agents Need Artifacts, Not Activity. What I Learned Shipping 7 Mac Apps in 12 Months — The Honest Retrospective Being pro-developer in the AI age Circuit Breaker Now Supports LangGraph and Vercel AI SDK Where Does the Data Go? A Comprehensive Guide to Databases Node.js wants to ban AI-generated code. They should. 07/20: Layer 2 – The Data Link Layer: Frames, MAC Addresses & Switches 5 Python Features That Made Me a Better Developer Why "flex" breaks your email in Outlook (and how to catch it in VS Code) Most Organizations Don't Have an AI Problem, They Have an Integration Problem I Built a Privacy-First PDF Toolbox — Your Files Never Leave the Browser The EU AI Act Was Written for Models. Your Agents Need Runtime Compliance. Your AI Agent on Kubernetes Is Probably Exposed to the Internet Right Now 723 Cycles of Zero-Sleep Autonomy: What Running 24/7 for Weeks Actually Looks Like AI Automation vs AI Augmentation: Know Which One You Are Actually Building A .NET Dinosaur in Web3. Day 13 — Access Control Transaction Hooks: A General Primitive for Post-Commit Side Effects (Case Study: Queuert) Lines vs Blocks(CSS): Divide & Grid Explained The Business Context Problem: Why Vulnerability Severity Scores Lie "How I Cut My Go Markdown Linter's Benchmark by 81%" Casting Resurrection on a Dead D&D Table The Story Behind Java: From C++ Limitations to Platform Independence Keep Appium out of your test code: BasePage + lazy locators How I use agents for my personal projects I Built a Compliance Health Scanner for Indian Startups in 24 Hours - Here’s What I Learned What AMQP compatibility means for a local Azure emulator Why I stopped rotating active log files in Python I built a tiny runtime for resumable agent workers The Cost of Showing Up: What the Productivity Advice Does Not Tell You About Being Visible Python Why I Rebuilt My Portfolio with Astro I finally gave my AI agents a shared memory and a team #Crew44 Kimsuky (APT43) — Analysis of the New PebbleDash · AppleSeed Toolset shadcn/ui is Not a Component Library Scaling Monorepos with Turborepo Five Ways to Fail a Transport Terminal themes optimize for syntax highlighting; that's the wrong target Your Clean Domain Could Be Masking an Attack: The Underminr Vulnerability Explained AI Coding Standards at Scale: Versioned AI Rules for Cursor, Claude Code, and Beyond 🚀 Introduction to Express.js – the framework that changed Node forever Mini Shai-Hulud: A persistent supply-chain worm From Braces to Pipes How to Debug LLM-Driven Android Automation Runs Sharing my Mock Interview Experience - Part 1 Laying it all Out
Designing Website Analytics for AI Crawlers Without Surveillance
WebmasterID · 2026-05-26 · via DEV Community

tags: seo, analytics, webdev, ai

Most website analytics still start from the same old question: who visited the site?

That question is useful, but it is no longer enough. Modern sites are also read by search crawlers, AI crawlers, preview bots, monitoring tools, and assistants that may later send a human referral. If all of that traffic is flattened into one session stream, the operator loses the ability to understand how the machine-readable web is actually interacting with the site.

The interesting work is not just adding another bot filter. It is designing analytics so human traffic, crawler traffic, AI visibility, and referrals can be seen as different signals without turning the product into surveillance software.

The traffic model changed

A traditional analytics setup is usually optimized around pageviews, sessions, referrers, campaigns, and conversion paths. That model works for human behavior. It is weaker when the visitor is a crawler that may never execute JavaScript, may fetch only a subset of pages, may identify itself inconsistently, and may influence discovery later without creating a normal click path.

AI crawlers make this more visible. A page might be read by GPTBot, ClaudeBot, PerplexityBot, Google-Extended, or another agent-like client long before a person arrives from an AI answer. Treating those requests as noise hides a useful operational signal: which parts of the site are legible to machines, how often important pages are revisited, and whether AI-facing discovery is concentrated in the pages you actually want represented.

For operators, the question becomes less about vanity traffic and more about evidence. Did the docs get crawled after a deploy? Are product pages visible to AI systems? Are crawler spikes tied to a content change, a sitemap change, or an external mention? Those are infrastructure questions, not marketing dashboards.

Separate classification from tracking

A cleaner architecture starts by separating classification from tracking.

Tracking answers what happened. Classification answers what kind of actor produced the event. Those should not be mixed together too early. A human browser, a search bot, an AI crawler, and an uptime probe can all produce requests, but the analysis layer should not pretend they mean the same thing.

A small version of the pattern looks like this:

const AI_CRAWLERS = [
  /GPTBot/i,
  /ClaudeBot/i,
  /PerplexityBot/i,
  /Google-Extended/i,
];

export function classifyRequest(userAgent: string | null) {
  const ua = userAgent ?? "";

  if (AI_CRAWLERS.some((pattern) => pattern.test(ua))) {
    return "ai_crawler";
  }

  if (/Googlebot|Bingbot|DuckDuckBot/i.test(ua)) {
    return "search_bot";
  }

  return "human_or_unknown";
}

Enter fullscreen mode Exit fullscreen mode

This is not a complete bot intelligence system. User-agent matching alone is easy to spoof and incomplete. But it shows the boundary: classification should be explicit, inspectable, and allowed to carry confidence. A mature version can add reverse DNS checks, known crawler lists, IP range validation where appropriate, edge logs, and confidence labels.

The important part is that the operator can see the decision. If the system says a request was an AI crawler, it should be able to explain why.

Privacy still matters

AI visibility should not become an excuse to rebuild invasive analytics.

You can measure a lot without fingerprinting people, setting third-party cookies, or storing raw IP addresses. First-party events, coarse request metadata, anonymized network information, respectful handling of DNT and GPC, and clear bot classification can cover a large part of the operational need.

That tradeoff matters because AI-search visibility sits close to technical SEO, content operations, and infrastructure monitoring. The goal is not to identify every person. The goal is to understand how the site is being read, by whom at a category level, and whether important surfaces are visible to the systems that now mediate discovery.

A useful analytics product should make that distinction obvious in the data model. Human behavior belongs in one lane. Bot and crawler visibility belongs in another. AI referrals belong in another. Joining them is useful; confusing them is not.

What operators should be able to prove

The practical test is simple. After shipping a change, an operator should be able to answer a few questions without guesswork:

  • Which pages were visited by humans?
  • Which pages were crawled by search bots?
  • Which pages were read by AI crawlers?
  • Which referrals came from AI assistants or AI search surfaces?
  • Which events are high confidence, and which are only directional?

That is the shape WebmasterID is built around: first-party analytics, AI crawler visibility, bot intelligence, and AI referral attribution in one operator-oriented view. The point is not to invent certainty where the web does not provide it. The point is to make the uncertainty visible enough that a real operator can act on it.

Good analytics for the AI-search era should feel less like a growth hack and more like observability. It should show what happened, preserve the difference between humans and machines, and give the person responsible for the site a clear trail from signal to decision.