惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
爱范儿
爱范儿
WordPress大学
WordPress大学
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
罗磊的独立博客
S
SegmentFault 最新的问题
V
V2EX
V
Visual Studio Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
美团技术团队
博客园 - 三生石上(FineUI控件)
Stack Overflow Blog
Stack Overflow Blog
Y
Y Combinator Blog
MyScale Blog
MyScale Blog
D
Docker
Google DeepMind News
Google DeepMind News
Blog — PlanetScale
Blog — PlanetScale
M
Microsoft Research Blog - Microsoft Research
Martin Fowler
Martin Fowler
S
Secure Thoughts
B
Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
C
Cisco Blogs
C
CERT Recently Published Vulnerability Notes
T
True Tiger Recordings
GbyAI
GbyAI
P
Proofpoint News Feed
P
Privacy International News Feed
Jina AI
Jina AI
The Cloudflare Blog
I
Intezer
AWS News Blog
AWS News Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
S
Security Archives - TechRepublic
NISL@THU
NISL@THU
The Register - Security
The Register - Security
Recent Commits to openclaw:main
Recent Commits to openclaw:main
P
Palo Alto Networks Blog
S
Schneier on Security
L
LINUX DO - 热门话题
C
CXSECURITY Database RSS Feed - CXSecurity.com
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA

DEV Community

Unity’s AI agent went public: the developers of a static analysis tool on what that means for code quality Anna's Archive publica un llms.txt para los LLMs que rastrean su catálogo Why I Built Mneme HQ: Preventing AI Agent Architectural Drift I Built a Pay-Per-Call Crypto Signal API with x402 — Heres the Architecture 🚀 “From Prompts to Autonomous Agents: What Google I/O 2026 Changed” The Power of Distributed Consensus in Autonomous SOCs Sixteen TUI components, copy-paste, no dependency The Boring Reliability Layer Every Autonomous Agent Needs Nven - Secret manager Building Multi-Tenant Row-Level Security in PostgreSQL: A Production Pattern The Hardest Part of Being a Developer Isn't Coding Building Vylo — Looking for Collaborators, Partners & Early Support I Thought Memory Fades With Time. It Actually Fades With Information. ORA-00064 오류 원인과 해결 방법 완벽 가이드 I registered an AI agent at 1 AM and something cracked open in my head Pitch: Nven - Sync secrets. Ship faster. Why y=mx+b is the heart of AI From Routines to a Crew — Building a System That Plans Its Own Work & executes it 25 React Interview Questions 2026 (With Answers) — Hooks, React 19, Concurrent Mode An open source LLM eval tool with two independent quality signals Using Dashboard Filtering to Get Customer Usage in Seconds from TBs of Data Skills, Java 17, And Theme Accents 4 Hard Lessons on Optimizing AI Coding Agents Arctype: Cross-Platform Database GUI for LLM Artifacts Your robots.txt says GPTBot is welcome. Your server says 403. Organizing How to Use AWS Glue Workflow 5 n8n Automations Every Digital Agency Should Be Running (Bill More, Work Less) Getting Started with TorchGeo — Remote Sensing with PyTorch Designing a Scalable Cross-Platform Appium Framework Google Antigravity 2.0 & Slash Commands Building a Unified Adaptive Learning Intelligence with Gemma 4, Flutter, and Multi-Model Orchestration Looking for beta testers for a £60 server management application The Disk-Pressure Incident That Taught Me to Always Set LimitRanges and Other Lessons from Mirroring EKS Locally. Why AI Should Not Write SQL Against ERP Databases Vibe coding works until it doesn't. The debt is real. Shipping at the Edge: Migrating a Coffee Subscription Platform to Cloudflare Workers Stop Tab-Switching: A Developer's Guide to Color Tools That Actually Fit the Workflow DevOps vs MLOps vs AIOps: What Changes, What Stays, and a Simple Roadmap to Get Started Run Powerful AI Coding Locally on a Normal Laptop 5 n8n Automations Every WooCommerce Store Needs (Save 10+ Hours/Week) What I Learned Building My Own AI Harness Hytale Servers Will Fail Treasure Hunts Until We Fix Our Event Handling Redux in React: Managing Global State Like a Pro Unfreezing Your GitHub Actions: Troubleshooting Stuck Deployments and Protecting Your Git Repo Statistics Unlocking Project Discoverability on GHES: A Key to Software Engineering Productivity When the Cleanup Code Becomes the Project Rockpack 8.0 - A React Scaffolder Built for the Age of AI-Assisted Development Mismanaging the Treasure Hunt Engine in Hytale Servers Will Get You Killed Why Hardcoded Automations Fail AI Agents Stop Calling It an AI Assistant. It’s Already Managing Your Company Why I built a post-quantum signing API (and why JWT is on borrowed time) Weekend Thought: Frontend Build Tools Suffer From Work Amnesia AI Is Changing Engineering Culture More Than We Realize A 10-Line Playwright Trick That Saved Me Hours on Every Sephora Run Everyone Was Focused on Gemini, But Infinite Scaler Was the Real Twister "Gemma 4 Analyzed My Bank Statements – Apparently I 'Have a Problem' with Coffee and Late-Night Apps" #css #webdev #beginners #codenewbie The Hidden Layer Every AI Developer Must Learn AlphaEvolve: Google DeepMind's Gemini-Powered Evolutionary Coding Agent RDS Reserved Instance Pricing: Every Engine, Every Rule, Real Dollar Savings How To Build An AI-Powered MVP Without Burning Your Startup Budget In 2026 Reading a Psychrometric Chart Without Getting Lost LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025) How to turn text into colors (without AI) Building Real-Time Apps in Node.js with Rivalis: WebSockets, Rooms, Actors, and a Binary Wire This Week In React #282 : Security, Fate, TanStack, Redux, Jotai | Hermes-node, Expo, Rozenite, Harness | TC39, Bun, pnpm, npm, Yarn, Node AI Copilot vs AI Agent Architecture - What's Actually Different (And Why It Matters) Smart Contract Security: NEAR's Futures Surge and AI Token Risks Database Maintenance: Tracing Production Incidents to Their Root Cause Stop juggling AI SDKs in PHP — meet Prisma Google Quietly Changed What “Apps” Mean at I/O 2026 The Infrastructure Team Is the Real Single Point of Failure Building SQLite from Scratch: 740 Lines of C++23 to Understand Every Byte of a .db File The 4 Levels of Hermes Agent Scaling Framework: From One Hermes Agent to a Fully Automated Team Your AI Has a Memory. It Just Doesn’t Know What to Remember. Claprec: Engineering Tradeoffs - Limited time vs. Perfection (6/6) Building a Daily Google News API Monitor in Python Building RookDuel Avikal: From Chess Steganography to Post-Quantum Archival Security Google I/O e IA: o que realmente muda na vida do dev? Color Contrast Failures: The Number One Accessibility Issue and How to Fix It # I Watched 15 Hours of Hermes Agent Videos So You Don't Have To Cómo solucionar el bucle infinito en useEffect con objetos y arrays en React The First Agent-Centric Cloud Security Platform — And Why We Didn't Build It That Way On Purpose Most Treasure Hunts Engines on Hytale Servers Are Built to Fail - Lessons from a Burned Database GhostScan v3.0 — From Closed-Source EXE to Open-Source Pentest Framework De hojas de cálculo a IA: construyendo una plataforma SRM moderna When is AI fine in education? Python Tools for Managing API Rate Limits in Data Pipelines How to Implement Exponential Backoff for Rate-Limited APIs in Python "My Web Chat Wasn't a Real Channel. That Broke My Agent Pipeline" next-advanced-sitemap v1.0.7 — safer URL ingestion & automatic trimming for Next.js sitemap generation I keep seeing people build an AI lead processing agent when they really need a 6-step rules engine AI Powered Student Learning Assistant Using Gemma 4 How I Built a Drop-In Proxy to Slash My OpenAI Bills by 20%+ Automatically Building a Sarcastic AI English Tutor with Persona-as-Code and Gemini Audio Input for Pronunciation Correction Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture Cutting LTX-2 22B Peak VRAM by 40% with fp8_cast — and Why optimum-quanto Was a Trap HiDream Skeleton Mode: Prompt Beats OpenPose Ref — 8 Patterns Benchmarked
Training Your Mouse Behavior Clone: Make AI Browser Agents Move Like You
bing yu · 2026-05-17 · via DEV Community

In May 2026, a paper titled "FP-Agent: Fingerprinting AI Browsing Agents" was published on arXiv. The research team measured 7 mainstream AI browsing agents and found that their mouse trajectories, typing rhythms, and other behavioral features form a distinctive fingerprint -- not only distinguishing AI from humans, but also differentiating between different agent frameworks.

What's more concerning: the behavioral consistency across existing schemes makes them easy to classify as automated traffic.

This article explores, from a browser automation developer's perspective, how to use deep learning to make AI agent mouse operations learn your style instead of applying a generic "humanization" template.


The Problem: Why Does "Humanization" Become a Fingerprint?

Most mainstream browser automation frameworks handle "humanization" of mouse movements like this:

def move_mouse(x, y):
    points = bezier_curve(current_pos, x, y)
    for px, py in points:
        mouse.move(px, py)
        time.sleep(random.uniform(0.01, 0.03))

Enter fullscreen mode Exit fullscreen mode

Seems reasonable? But here's the problem:

  • Everyone using this framework generates the same class of Bezier curves
  • Random jitter follows the same distribution
  • Overshoot trigger probability is the same fixed value

If you run 1,000 automation instances and cluster their mouse trajectories, you'll find they heavily overlap. This creates a behavioral fingerprint — detect one pattern, identify all instances using that framework.

The irony: the more "humanization" features you add, the less human it becomes — because there's no unified "human" pattern. If all your automation instances share the same "humanization" parameters, those parameters themselves become a massive collective fingerprint.

A Different Approach: Learn "You", Not "Human"

If the model is trained on your personal mouse operation data and generates trajectories with your personal style, the situation changes completely:

Dimension Generic Humanization Personal Behavior Clone
Trajectory Distribution Shared by all users Unique per person
Detection Difficulty One cluster identifies all Requires per-person modeling
Model Size N/A ~2MB

The core shift: instead of using more complex rules to simulate "human", let the model learn "you" from your own data.


Data Collection: Record Your Habits Non-Invasively

To do behavior cloning, first you need your personal mouse trajectory data.

The implementation is simple — a Tampermonkey userscript that listens to mousemove events and records the complete trajectory from mouse movement to click. Movements shorter than 20px are treated as stationary clicks and discarded. We care about movement patterns, not the click itself.

The data format is straightforward:

{
  "viewport": {"w": 2018, "h": 1075},
  "trajectory": [
    {"x": 1480, "y": 322, "t": 0},
    {"x": 1504, "y": 317, "t": 31},
    {"x": 1501, "y": 319, "t": 69}
  ],
  "target": {"tag": "DIV", "text": "Code"}
}

Enter fullscreen mode Exit fullscreen mode

x/y are viewport coordinates, t is the time offset in milliseconds relative to the trajectory start. The target HTML tag is included because clicking a button vs. clicking a link produces genuinely different trajectories — buttons have larger target areas and more casual movements; links have smaller targets with more cautious end phases.

After a few days of normal browsing, you'll have hundreds to thousands of trajectories. Then use the Tampermonkey menu to export a .jsonl file and drop it into the project's data/ directory.


Architecture: Three Models, Each Handles One Thing

With the data collected, the initial approach was to train a large GRU model for end-to-end spatial and temporal prediction. But experiments showed this didn't work well — the model would either learn overly smooth spatial trajectories (losing personalized arc styles) or uniform timing (losing acceleration/deceleration rhythms).

The solution was to decompose the problem into three modules:

Bezier (Skeleton)  →  NoiseModel (Spatial Deviation)  →  GRU (Timing)

Enter fullscreen mode Exit fullscreen mode

  • Bezier Curve generates the macroscopic skeleton -- it's a fixed algorithm, doesn't learn anything, just guarantees a reasonable path from start to end.

  • NoiseModel is a small GRU (~166KB) that receives Bezier control points and outputs your personalized (x, y) path. It learns how you deviate from the ideal path -- what arcs you prefer, how much you jitter.

  • GRU (~2MB) takes the NoiseModel's spatial path and predicts only when each point is reached -- where you speed up, slow down, or hesitate.

Decomposing this way worked much better. Intuitively, the arc you take and when you accelerate/decelerate are two separate things. Learning them independently keeps each model cleaner.

How NoiseModel Learns Spatial Deviation

NoiseModel's input is 8 Bezier curve parameters (start point, end point, two control point coordinates, normalized to viewport). It autoregressively generates a series of (x, y) points.

During training, the first 5 epochs use pure teacher forcing with real trajectory points, then gradually reduce until the model fully self-predicts. This way it learns real data patterns while being able to generate paths independently at inference time.

One detail: points in the final 20% of the trajectory get 4x loss penalty during training. The deceleration and fine-tuning phase near the target is where human vs. machine differences are most pronounced -- machines tend to arrive straight-on, while humans show subtle overshoot and correction.

How GRU Learns Timing

GRU's input is per-step "relative space" features: distance remaining to target, how far the last step moved, how long the last step took, current progress. Not absolute coordinates — because when humans move a mouse, the brain processes "the target is still that direction, roughly how far away", not "the cursor is at pixel coordinates on screen".

There's a gotcha with time handling: in the raw data, time differences between adjacent points range from 8ms to 3494ms (extreme outliers). Direct training would be dominated by these outliers. The fix is a log1p transform -- compressing the range to [0, 8.8], then using expm1 inverse transform after training.

Another practical finding: the model tends to predict slower times than actual. Adding a 0.70 scaling factor after training makes the generated total duration match the real median.

Why Not Transformer

You might ask, why GRU when everyone's using Transformers now? Three reasons:

  1. Trajectories are strongly continuous unidirectional sequences — GRU's inductive bias naturally matches this
  2. Inference requires autoregressive generation, each step depends only on the previous one — GRU is much lighter than Transformer
  3. Small dataset (hundreds of samples) — GRU generalizes better on small data

Training: Are a Few Hundred Samples Enough?

Honest answer: a few hundred is barely enough. The ideal is thousands per person.

In practice, both models are trained entirely on real data, no synthetic data dependency. The training strategy is to upweight the few hundred real samples (default ×10), telling the optimizer "prioritize fitting these real samples first".

Training speed is also decent: GRU single epoch on GPU is about 45 seconds, 200 epochs under 3 minutes. CPU works too, just takes a bit longer.

Final output: NoiseModel ~166KB, GRU ~2MB. Two files totaling under 3MB, CPU inference latency under 5ms.


An Overlooked Detail: Event Rate

GRU-generated trajectories typically have only ~30 points, with ~27ms intervals — roughly 37Hz sampling rate.

But what's the real mouse sampling rate? Browser-captured mousemove is usually around 60Hz (~16ms). Actual mouse hardware sampling rate is typically 125Hz (~8ms).

30 points looks too sparse. More critically: if your automation always produces exactly 30 points, that's an obvious artificial fingerprint. Real human mouse movements have variable event counts — a 700px movement might generate 20 effective events when moving fast, or 80 when moving slowly with fine adjustments.

So a resampling layer is added after GRU output — "translating" the model's predicted timing contour into real mouse event rates:

  • Adaptive intervals: sparse during fast phases (~14ms), dense during deceleration (~4ms)
  • Plus-minus 3ms time jitter: simulates hardware sampling noise
  • Plus-minus 0.3px spatial jitter: simulates sensor accuracy
  • Variable point count: same start/end generates 20-80+ events each time

This step only does "translation", it doesn't change the model's predicted velocity curve. When the model says "this action takes 800ms, accelerates in the middle, decelerates at the end", resampling just breaks that into irregularly-spaced events.


Results: How Does It Look

Let's look at the comparison directly.

Left: three-stage pipeline generated trajectory (Bezier skeleton + NoiseModel spatial deviation + GRU timing). Right: pure Bezier curves. The pipeline trajectory isn't a smooth mathematical curve -- it has personalized arc deviations, uneven speed, and subtle hesitation near the end.

The dynamic comparison is more intuitive:

  • Mechanical (gray): instant teleport to target, pure algorithm
  • Bezier (blue): smooth uniform glide, still algorithmic
  • GRU Model (gold): acceleration, hesitation, end-phase micro-adjustment — style learned from real data

Quick Start

# 1. Install dependencies
pip install torch numpy matplotlib

# 2. Place your exported .jsonl files in the data/ directory

# 3. Train NoiseModel (spatial personalization)
python training/generate_trajectories.py --train-noise --epochs 100

# 4. Train GRU (timing personalization)
python training/train_mouse_model.py --epochs 200

# 5. Visualize results
python examples/demo.py --save output.png
python examples/animate_demo.py --save output.gif

Enter fullscreen mode Exit fullscreen mode

Integrate into browser automation:

from mouse_controller import move_to_humanized

# Before clicking
await move_to_humanized(page, target_x, target_y, tag="BUTTON")
# Then execute click
await page.click(target_x, target_y)

Enter fullscreen mode Exit fullscreen mode

If the model fails to load, the framework automatically falls back to Bezier curves — no breaking changes.


Limitations

A few limitations worth being transparent about:

Data volume. A few hundred samples is barely sufficient; the ideal is thousands per person. The good news is this is a positive flywheel — more data means a better clone of you.

NoiseModel's assumption. The current "real = Bezier + residual" assumption is strong. A better approach would use a generative model (diffusion or VAE) to generate full trajectories directly.

Multimodal. Mouse trajectory is just one dimension of behavioral fingerprinting. Keyboard rhythm, scroll patterns, mouse dwell time aren't modeled yet. For keyboard rhythm, add a keyboard.jsonl under data/ (same format as trajectory) and adapt the scripts under training/ to reuse the existing GRU pipeline.

These limitations are also directions for future iteration.


Closing Thoughts

The AI browser agent ecosystem in 2026 is maturing rapidly. Operation accuracy is no longer the core bottleneck. The next bottleneck is trust — platforms don't trust AI traffic, users don't trust mechanical operations.

The future of AI operating computers shouldn't be "a generic robot simulating a generic human" — it should be your AI assistant, operating in your style, with your decision preferences, and behavioral characteristics that match you.

All code and data in this project is for technical research and personalized AI research use only.


Project code: https://github.com/YuBing-link/mouse-behavioral-clone

Paper reference: FP-Agent: Fingerprinting AI Browsing Agents, arXiv:2605.01247, May 2026