惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

爱范儿
爱范儿
E
Exploit-DB.com RSS Feed
Google DeepMind News
Google DeepMind News
F
Full Disclosure
D
Darknet – Hacking Tools, Hacker News & Cyber Security
T
ThreatConnect
Stack Overflow Blog
Stack Overflow Blog
Last Week in AI
Last Week in AI
Martin Fowler
Martin Fowler
G
GRAHAM CLULEY
C
Check Point Blog
T
Threatpost
I
Intezer
Spread Privacy
Spread Privacy
The Register - Security
The Register - Security
Project Zero
Project Zero
月光博客
月光博客
人人都是产品经理
人人都是产品经理
阮一峰的网络日志
阮一峰的网络日志
D
DataBreaches.Net
IT之家
IT之家
Malwarebytes
Malwarebytes
T
The Blog of Author Tim Ferriss
P
Privacy International News Feed
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
量子位
李成银的技术随笔
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Cisco Talos Blog
Cisco Talos Blog
Know Your Adversary
Know Your Adversary
美团技术团队
The GitHub Blog
The GitHub Blog
T
Tor Project blog
M
MIT News - Artificial intelligence
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
有赞技术团队
有赞技术团队
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
博客园 - 司徒正美
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
C
Comments on: Blog
T
Threat Research - Cisco Blogs
aimingoo的专栏
aimingoo的专栏
Security Latest
Security Latest
NISL@THU
NISL@THU
The Cloudflare Blog
H
Help Net Security
Recent Commits to openclaw:main
Recent Commits to openclaw:main

The Cloudflare Blog

The day my ping took countermeasures Announcing Claude Compliance API support with Cloudflare CASB Announcing Claude Managed Agents on Cloudflare Project Glasswing: what Mythos showed us Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse Browser Run: now running on Cloudflare Containers, it’s faster and more scalable When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug Building For The Future How Cloudflare responded to the “Copy Fail” Linux vulnerability When DNSSEC goes wrong: how we responded to the .de TLD outage Code Orange: Fail Small is complete. The result is a stronger Cloudflare network Introducing Dynamic Workflows: durable execution that follows the tenant Post-quantum encryption for Cloudflare IPsec is generally available Agents can now create Cloudflare accounts, buy domains, and deploy Shutdowns, power outages, and conflict: a review of Q1 2026 Internet disruptions Making Rust Workers reliable: panic and abort recovery in wasm‑bindgen Moving past bots vs. humans Building the agentic cloud: everything we launched during Agents Week 2026 The AI engineering stack we built internally — on the platform we ship Orchestrating AI Code Review at scale Introducing the Agent Readiness score. Check to see if your site is agent-ready Shared Dictionaries: compression that keeps up with the agentic web Redirects for AI Training enforces canonical content Unweight: how we compressed an LLM 22% without sacrificing quality Agents that remember: introducing Agent Memory Agents Week: network performance update Introducing Flagship: feature flags built for the age of AI Cloudflare’s AI Platform: an inference layer designed for agents Building the foundation for running extra-large language models AI Search: the search primitive for your agents Deploy Postgres and MySQL databases with PlanetScale + Workers Artifacts: versioned storage that speaks Git Email for agents - Cloudflare Email Service now in public beta Project Think: building the next generation of AI agents on Cloudflare Introducing Agent Lee - a new interface to the Cloudflare stack Register domains wherever you build: Cloudflare Registrar API now in beta Browser Run: give your agents a browser Rearchitecting the Workflows control plane for the agentic era Add voice to your agent Managed OAuth for Access: make internal apps agent-ready in one click Securing non-human identities: automated revocation, OAuth, and scoped permissions Scaling MCP adoption: Our reference architecture for simpler, safer and cheaper enterprise deployments of MCP Secure private networking for everyone: users, nodes, agents, Workers — introducing Cloudflare Mesh Building a CLI for all of Cloudflare Durable Objects in Dynamic Workers: Give each AI-generated app its own database Agents have their own computers with Sandboxes GA Dynamic, identity-aware, and secure Sandbox auth Welcome to Agents Week 500 Tbps of capacity: 16 years of scaling our global network From bytecode to bytes- automated magic packet generation Cloudflare targets 2029 for full post-quantum security How we built Organizations to help enterprises manage Cloudflare at scale Why we're rethinking cache for the AI era Our ongoing commitment to privacy for the 1.1.1.1 public DNS resolver Introducing EmDash — the spiritual successor to WordPress that solves plugin security Introducing Programmable Flow Protection: custom DDoS mitigation logic for Magic Transit customers Cloudflare Client-Side Security: smarter detection, now open to everyone How we use Abstract Syntax Trees (ASTs) to turn Workflows code into visual diagrams A one-line Kubernetes fix that saved 600 hours a year Sandboxing AI agents, 100x faster Inside Gen 13- how we built our most powerful server yet Launching Cloudflare’s Gen 13 servers- trading cache for cores for 2x edge compute performance Powering the agents: Workers AI now runs large models, starting with Kimi K2.5 Introducing Custom Regions for precision data control Standing up for the open Internet- why we appealed Italy’s Piracy Shield fine From legacy architecture to Cloudflare One Announcing Cloudflare Account Abuse Protection: prevent fraudulent attacks from bots and humans Slashing agent token costs by 98% with RFC 9457-compliant error responses AI Security for Apps is now generally available Building a security overview dashboard for actionable insights Investigating multi-vector attacks in Log Explorer Translating risk insights into actionable protection: leveling up security posture with Cloudflare and Mastercard Fixing request smuggling vulnerabilities in Pingora OSS deployments Active defense: introducing a stateful vulnerability scanner for APIs Complexity is a choice. SASE migrations shouldn’t take years. From the endpoint to the prompt: a unified data security vision in Cloudflare One Ending the "silent drop": how Dynamic Path MTU Discovery makes the Cloudflare One Client more resilient A QUICker SASE client: re-building Proxy Mode How Automatic Return Routing solves IP overlap Always-on detections: eliminating the WAF “log versus block” trade-off Mind the gap: new tools for continuous enforcement from boot to login Stop reacting to breaches and start preventing them with User Risk Scoring Defeating the deepfake: stopping laptop farms and insider threats Moving from license plates to badges: the Gateway Authorization Proxy Evolving Cloudflare’s Threat Intelligence Platform: actionable, scalable, and ETL-less Introducing the 2026 Cloudflare Threat Report See risk, fix risk: introducing Remediation in Cloudflare CASB How Cloudy translates complex security into human action From reactive to proactive: closing the phishing gap with LLMs Modernizing with agile SASE: a Cloudflare One blog takeover Beyond the blank slate: how Cloudflare accelerates your Zero Trust journey The truly programmable SASE platform Toxic combinations: when small signals add up to a security incident We deserve a better streams API for JavaScript The most-seen UI on the Internet? Redesigning Turnstile and Challenge Pages ASPA: making Internet routing more secure Bringing more transparency to post-quantum usage, encrypted messaging, and routing security How we rebuilt Next.js with AI in one week Cloudflare One is the first SASE offering modern post-quantum encryption across the full platform Cloudflare outage on February 20, 2026
Stop the Bots: Practical Lessons in Machine Learning
Cloudflare Team · 2019-02-20 · via The Cloudflare Blog

2019-02-20

5 min read

This post is also available in 简体中文.

Bot-powered credential stuffing is a scourge on the modern Internet. These attacks attempt to log into and take over a user’s account by assaulting password forms with a barrage of dictionary words and previously stolen account credentials, with the aim of performing fraudulent transactions, stealing sensitive data, and compromising personal information.

At Cloudflare we’ve built a suite of technologies to combat bots, many of them grounded in Machine Learning. ML is a hot topic these days, but the literature tends to focus on improving the core technology — and not how these learning machines are incorporated into real-world organizations.

Given how much experience we have with ML (which we employ for many security and performance products, in addition to bot management), we wanted to share some lessons learned with regard to how this technology manifests in actual products.

There tend to be three stages every company goes through in the life cycle of infusing machine learning into their DNA. They are:

  • Business Intelligence

  • Standalone Machine Learning

  • Machine Learning Productization

These concepts are a little abstract — so let’s walk through how they might apply to a tangible field we all know and love: dental insurance.

Business Intelligence

Many companies already have some kind of business intelligence: an ability to sort, search, filter and perform rudimentary labeling on a large corpus of data. Business Intelligence is not machine learning per se, but it can serve as its foundation.

  • Imagine your friendly neighborhood dental insurance company, ACMEDental, regularly receives dental claims. For each claim, a trained insurance professional evaluates the incoming X-rays to confirm the dentist’s diagnosis — and marks the claim as accurate or inaccurate.

  • On the surface, this data provides actionable intelligence: an uptick in incorrect diagnoses from a particular dentist or region might warrant further investigation. But this data can be used for something much more interesting.

Standalone Machine Learning

The key to training a machine learning model is to compile a labeled data set: raw data paired with a descriptive label that tells the computer what it’s looking at. These data sets are often a natural byproduct of Business Intelligence, as we’re about to see.

  • Some ingenious engineers at ACMEDental notice that through their day-to-day operations they’ve compiled a large repository of X-rays, labeled either ‘accurate’ or ‘inaccurate’.

  • With several thousand labeled X-rays in hand, they decide to train a machine learning algorithm that can judge X-rays automatically. They use one of several open-source tools to do this using image recognition, yielding an ML algorithm that can scan an X-ray and categorize the claim with impressive accuracy.

Machine Learning Productization

Once a functioning algorithm is in-hand, there is the matter of turning it into a product. This generally involves collaboration between engineering, product design, and ultimately, business development and sales.

  • ACMEDental finds its new ML algorithm to be so effective that its product managers decide to offer it for license to X-ray machine manufacturers. By integrating the ACMEDental algorithm, a dentist could accelerate their workflow and reduce the likelihood of incorrect diagnoses. The machine could also communicate with the insurance company for instant claim approvals.

  • After productizing the algorithm for integration, ACMEDental’s business development team reaches out to manufacturers like Samsung to pursue a partnership.

  • The new X-ray machines, enhanced by ACMEDental’s algorithm, prove to be a hit on the market — allowing dentists to offload routine diagnostics to their assistants.

With this framework in mind let’s explore how we’re leveraging machine learning at Cloudflare.

To inform our machine learning models, we rely on data from the 13 million domains on Cloudflare’s network, which sees more than 660 billion requests per day serving more than 2.8 billion people per month. We employ this huge volume of data to address one of the most urgent security threats on the web: bot attacks.

Business Intelligence

We recently analyzed one day’s worth of requests from across all of Cloudflare’s 13 million domains -- 660 billion requests -- and labeled each potential attack with a ‘bot score’ ranging from 0 to 100. Cloudflare already has an extensive array of tools which help inform this score, but in principle, we could manually label each datapoint as ‘bot’ or ‘not bot’, similar to the dentist example above.

One initial takeaway from our analysis was the geographical breakdown of attacks. Rudimentary bot-protection tools rely on blocking IP addresses from countries commonly associated with hostile traffic. But when we ranked the origin of bots across our network, it was obvious that this approach was untenable as most attacks originate from countries with huge volumes of legitimate traffic. A more sophisticated solution allows us to protect against bots without affecting real users.

Standalone Machine Learning

Having compiled a large dataset sampled from Cloudflare’s network, our team of ML experts developed a state-of-the-art model to predict automated credential stuffing and other bot attacks.

Training: We started with a trillion requests (compiled over multiple days) in our training set, each of them labeled with the aforementioned bot score — and analyzed their features on a CPU/GPU cluster to recognize trends in malicious traffic. We used CatBoost, a gradient boosting library similar to XGBoost.

Validation: Though we perform hundreds of independent validations tests, the ultimate validation is how many login attempts have been challenged with a Captcha, versus how many have solved a Captcha (a solved Captcha likely indicates a false positive).

Over the course of a week, we deployed our solution to 95 websites with WordPress login pages and found that we issued more than 660,000 challenges, of which only 0.32% were solved — meaning our algorithm detected bots with a 99.68% rate of True Positives.

While this work is just beginning, WordPress represents 32.5% of all websites worldwide, so this is a very meaningful step forward. Meanwhile, we found that more than 80% of requests to WordPress login pages are Credential Stuffing attacks, underscoring just how prevalent these attacks are.

Machine Learning Productization

Deployment: Once we became sufficiently confident in the accuracy of our algorithm, we deployed it as one of the many security features constantly running across the 165 server facilities worldwide that comprise Cloudflare’s network edge. Today, it evaluates over 660 billion requests per day — learning and improving from each additional attack observed on the network.

But it wasn’t long before we found ourselves asking: is knowing and stopping credential stuffing attacks in real-time enough? Can we do more?

We used our existing data and began thinking of how to develop an algorithm to predict which companies will be attacked next. This would allow us to proactively warn companies before an attack occurs and prepare them for the worst case scenario, even if they aren’t currently on Cloudflare’s network.

Training: We trained our Machine Learning Model with firmographic information like industry type, number of employees, and the revenue amount to predict the percentage of login attacks.

Results: Amongst the many interesting findings is that smaller companies have a higher fraction of attacks. Intuitively, this makes sense because the opportunistic attack traffic scanning the Internet is equal on all pages while the human traffic on popular sites outweighs the bot attacks. Thus even though small companies represent a smaller traffic volume, they're the most vulnerable to attack.

Do these findings only affect the 13 million Cloudflare websites with 2.8 billion visitors across 237 countries that we had in our model? Likely not. That means Cloudflare can begin thinking about helping all companies by proactively anticipating attacks based on their unique risk profile.

Machine learning has been crucial to the development of our next generation of products, and we’ve only scratched the surface. We hope that this post is useful as you map out the trajectory of ML in your own organization: your road will be different, but hopefully you will see some familiar milestones along the way.

Cloudflare is actively developing its machine learning capabilities — if you’re interested in joining us or partnering with us on our mission please get in touch.

AttacksSecurity

Related posts

May 18, 2026

Project Glasswing: what Mythos showed us

In recent weeks, we pointed Mythos and other security-focused LLMs at live code across critical parts of our infrastructure. We share what we observed, the models’ strengths and weaknesses, and what the work around them needs to look like before any of it can scale....

    By 

May 07, 2026

How Cloudflare responded to the “Copy Fail” Linux vulnerability

When a critical Linux kernel privilege escalation was publicly disclosed, Cloudflare's security and engineering teams detected, investigated, and mitigated the threat across our global fleet, confirming zero customer impact and no malicious exploitation....

    By