惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Forbes - Security
Forbes - Security
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
F
Fortinet All Blogs
B
Blog
T
The Blog of Author Tim Ferriss
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI
Y
Y Combinator Blog
Microsoft Azure Blog
Microsoft Azure Blog
L
LangChain Blog
Recent Announcements
Recent Announcements
U
Unit 42
Martin Fowler
Martin Fowler
M
MIT News - Artificial intelligence
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
The Register - Security
The Register - Security
Recorded Future
Recorded Future
C
Check Point Blog
V
V2EX
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Hugging Face - Blog
Hugging Face - Blog
WordPress大学
WordPress大学
Google DeepMind News
Google DeepMind News
酷 壳 – CoolShell
酷 壳 – CoolShell
F
Full Disclosure
小众软件
小众软件
A
About on SuperTechFans
云风的 BLOG
云风的 BLOG
宝玉的分享
宝玉的分享
Last Week in AI
Last Week in AI
有赞技术团队
有赞技术团队
MongoDB | Blog
MongoDB | Blog
爱范儿
爱范儿
P
Proofpoint News Feed
罗磊的独立博客
量子位
D
Docker
博客园_首页
D
DataBreaches.Net
Project Zero
Project Zero
博客园 - 司徒正美
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
博客园 - Franky
Security Latest
Security Latest
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
N
Netflix TechBlog - Medium
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
博客园 - 三生石上(FineUI控件)
H
Hackread – Cybersecurity News, Data Breaches, AI and More
大猫的无限游戏
大猫的无限游戏

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Blocking Secrets Before They Hit the Repository: Building a Pre-Commit Hook With ML
Patience Mpo · 2026-05-16 · via DEV Community

here are two places you can catch an exposed secret.

After it's in the repository — in a CI/CD pipeline scan, a periodic audit, or a breach notification from a security researcher who found it in your public history. Or before it ever gets there — at the moment of git commit, when the developer is still at their keyboard and the fix takes thirty seconds.

The second option is better in every dimension. Earlier detection means lower remediation cost. A blocked commit means no credential rotation required, no incident response, no git history rewriting. The developer who gets stopped at commit understands immediately what they did and why — the context is fresh, the fix is obvious.

The challenge is UX.

A pre-commit hook that's too slow gets disabled. A hook that generates too many false positives gets disabled. A hook that doesn't explain itself gets disabled and complained about on Slack. A hook that developers trust — that's fast, precise, and tells them exactly what it found and why — stays enabled and actually prevents exposures.

This article is about building a pre-commit hook that developers will actually leave on.


What the Hook Needs to Do

Before writing a line of code, I defined what a good pre-commit secrets hook looks like from the developer's perspective.

Speed. The hook runs on every commit. If it adds more than two or three seconds, developers will notice and resent it. On a typical feature branch with a handful of changed files, the scan needs to complete in under two seconds.

Scope. The hook should scan staged content — only the files about to be committed — not the entire repository. Scanning everything on every commit is unnecessary and slow.

Signal clarity. When the hook blocks a commit, the developer needs to know immediately: which file, which line, what variable, why it was flagged. "Secret detected" with no context is useless. "HIGH confidence (94%): api_key = "sk-proj-abc123..." in config/settings.py line 47 — matches OpenAI key format" is actionable.

Suppression path. Developers need a documented, low-friction way to handle false positives. The hook can't be a hard wall with no escape — that's how hooks get disabled entirely.

Non-destructive. The hook never modifies files. It either passes silently or blocks and explains. That's it.


Architecture: Scanning Staged Content

The first architectural decision is what to scan. There are two options:

Option A: Scan the working tree — the files as they currently exist on disk, including unstaged changes.

Option B: Scan the staged content — exactly what git diff --cached shows, which is what will actually be committed.

Option B is correct. Scanning the working tree means flagging things the developer hasn't committed and may never intend to commit. That's noise. Scanning staged content means flagging exactly what's about to enter the repository — which is the precise intervention point.

def get_staged_content() -> dict[str, str]:
    """Get the staged content for all modified/added files."""
    staged_files = {}

    # Get list of staged files
    result = subprocess.run(
        ["git", "diff", "--cached", "--name-only", "--diff-filter=ACM"],
        capture_output=True, text=True
    )

    filenames = result.stdout.strip().split('\n')

    for filename in filenames:
        if not filename:
            continue

        # Get staged content (not working tree content)
        content_result = subprocess.run(
            ["git", "show", f":{filename}"],
            capture_output=True, text=True
        )

        if content_result.returncode == 0:
            staged_files[filename] = content_result.content_result.stdout

    return staged_files

Enter fullscreen mode Exit fullscreen mode

The --diff-filter=ACM flag limits to Added, Copied, and Modified files — not deletions. Scanning deleted file content would generate findings for secrets that are being removed, which is the wrong direction.


The Scan Loop: From Staged Content to Findings

The hook extracts string literal assignments from each staged file and passes them through the ML classifier:

def scan_staged_files(staged_content: dict[str, str], threshold: float = 0.7):
    findings = []

    for filepath, content in staged_content.items():
        # Skip binary files, lock files, and known safe extensions
        if should_skip_file(filepath):
            continue

        lines = content.split('\n')

        for line_num, line in enumerate(lines, 1):
            # Skip lines with suppression annotation
            if '# secrets-ignore' in line or '# nosec' in line:
                continue

            # Extract (key_name, value) pairs from string assignments
            assignments = extract_string_assignments(line)

            for key_name, value in assignments:
                if len(value) < 8:  # Skip very short strings
                    continue

                features = extract_features(value, key_name)
                confidence = model.predict_proba([features])[0][1]

                if confidence >= threshold:
                    findings.append({
                        "file": filepath,
                        "line": line_num,
                        "key_name": key_name,
                        "value_preview": value[:20] + "..." if len(value) > 20 else value,
                        "confidence": confidence,
                        "severity": confidence_to_severity(confidence)
                    })

    return findings

Enter fullscreen mode Exit fullscreen mode

A few implementation details worth highlighting:

should_skip_file() excludes file types that generate systematic false positives: package-lock.json, yarn.lock, *.sum (Go module checksums), *.min.js (minified JavaScript), binary file extensions, and image files. These are maintained in a skip list rather than being hardcoded into the scan logic, so teams can extend it for their specific false positive patterns.

Value preview truncation. The finding reports only the first 20 characters of the flagged value, with ... truncation. Showing the full value in terminal output creates a secondary exposure — if someone is screen sharing when the hook fires, the secret shouldn't appear in full in the terminal.

Minimum length of 8. Strings shorter than 8 characters are almost never secrets. This eliminates a class of false positives from short configuration values and reduces scan time on files with many string literals.


The Output: Making Findings Actionable

The most important UX decision in the hook is what to show when a finding is blocked. I went through four iterations of the output format before settling on one that developers responded well to.

Iteration 1 (too terse):

BLOCKED: Secret detected in config/settings.py

Enter fullscreen mode Exit fullscreen mode

Developers immediately asked: "What secret? Where exactly? What should I do?"

Iteration 2 (better but still vague):

BLOCKED: Possible secret at config/settings.py:47

Enter fullscreen mode Exit fullscreen mode

Still not enough context. Developers had to open the file and count to line 47 to understand what was flagged.

Iteration 3 (too verbose):

[SECRETS DETECTOR] 
==========================================
COMMIT BLOCKED — POTENTIAL SECRET DETECTED
==========================================
File: config/settings.py
Line: 47
Variable: api_key
Value (truncated): sk-proj-abc123...
Confidence: 94%
Severity: CRITICAL
Matched Pattern: OpenAI API key format (sk-proj-*)
Feature contributions:
  - key_name_risk: 0.90 (HIGH)
  - shannon_entropy: 5.82 (HIGH)
  - pattern_openai_key: 1.00 (MATCH)
  - repetition_ratio: 0.94 (HIGH)

To suppress this finding, add '# secrets-ignore' to line 47
To bypass this check entirely (NOT RECOMMENDED): git commit --no-verify
==========================================

Enter fullscreen mode Exit fullscreen mode

This is technically complete but overwhelming. Developers in flow state don't want to read a report. They want to know: what, where, what to do.

Final version (what shipped):

🔴 Secrets Detector — Commit Blocked

  CRITICAL (94%) · config/settings.py:47
  api_key = "sk-proj-abc123..."
  ↳ Matches OpenAI key format · High entropy · Sensitive variable name

  To suppress false positive: add  # secrets-ignore  to line 47
  To use env vars instead:    export API_KEY="your-key"
                              then  api_key = os.environ["API_KEY"]

1 finding blocked this commit. Fix the issue or suppress with justification.

Enter fullscreen mode Exit fullscreen mode

The final format answers the three questions developers actually have in two seconds of reading: what is it (OpenAI key), where is it (file and line), what do I do (env var example or suppression). The feature contributions are available in verbose mode (--verbose) but don't appear by default.

The emoji is intentional. 🔴 provides an immediate visual signal in terminals that support it, and degrades gracefully to plain text in terminals that don't.


Handling Multiple Findings

When multiple findings exist, the output stacks them:

🔴 Secrets Detector — Commit Blocked

  CRITICAL (96%) · src/database.py:12
  DB_PASSWORD = "Tr0ub4dor&3"
  ↳ High-risk variable name · Matches human-chosen password pattern

  HIGH (78%) · src/config.py:34
  internal_token = "prod-service-backend-2019"
  ↳ Moderate-risk variable name · Low entropy but sensitive context

2 findings blocked this commit. Fix all issues before committing.

Enter fullscreen mode Exit fullscreen mode

Findings are sorted by confidence descending — the most certain findings appear first, which is where the developer's attention should go.

The commit is blocked if any finding exceeds the threshold, not just the highest-confidence one. A batch of MEDIUM confidence findings is still a blocked commit. If all findings are genuine false positives, they should all be suppressed with justification — not just the top one.


The Suppression UX

The suppression path needs to be low-friction but not invisible. If suppressing a false positive is too hard, developers will use git commit --no-verify to bypass the hook entirely — which defeats the purpose.

The designed flow:

# Developer encounters a false positive:
# file_integrity_hash = "d8e8fca2dc0f896fd7cb4cb0031ba249"  ← flagged

# They add the annotation with a justification:
# MD5 hash for file integrity check only — not a credential
file_integrity_hash = "d8e8fca2dc0f896fd7cb4cb0031ba249"  # secrets-ignore

# Commit proceeds normally on next attempt

Enter fullscreen mode Exit fullscreen mode

The # secrets-ignore annotation is visible in code review. A reviewer can see that a suppression was added and evaluate whether the justification is reasonable. This is the governance layer — suppressions can't happen silently.

The hook also respects the SECRETS_DETECTOR_THRESHOLD environment variable, which allows individual developers to adjust their personal threshold without modifying shared configuration:

# Developer who wants to see more findings (lower threshold)
SECRETS_DETECTOR_THRESHOLD=0.55 git commit -m "wip"

# Developer who wants fewer false positives (higher threshold)
SECRETS_DETECTOR_THRESHOLD=0.85 git commit -m "feature: payment flow"

Enter fullscreen mode Exit fullscreen mode

This flexibility matters for adoption. Some developers will want to see everything; others will want a tighter filter. Forcing everyone to the same threshold is a source of friction.


Installation: Making Setup Frictionless

A hook that's hard to install never gets installed. The setup needs to be one command:

# Using pre-commit framework (recommended)
pip install pre-commit
echo "repos:
- repo: https://github.com/pgmpofu/secrets-detector
  rev: v1.0.0
  hooks:
  - id: secrets-detector
    args: [--threshold, '0.7']" > .pre-commit-config.yaml
pre-commit install

Enter fullscreen mode Exit fullscreen mode

Or manual installation for teams not using the pre-commit framework:

# Copy hook to git hooks directory
cp hooks/pre-commit .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit

Enter fullscreen mode Exit fullscreen mode

The pre-commit framework approach is preferable for teams because it version-pins the hook, makes it part of the repository configuration (.pre-commit-config.yaml is committed), and automatically installs on git clone for new team members. The manual approach works for individual use.


What Happens at git commit --no-verify

This is the escape hatch that can't be removed. Git's --no-verify flag bypasses all hooks, and there's nothing a hook can do to prevent it.

The right response to this is not technical — it's cultural and procedural.

In a team setting, git commit --no-verify should require a comment in the commit message explaining why the hook was bypassed. This can be enforced through CI/CD: a pipeline step that checks whether any commit in a PR used --no-verify and requires a justification in the commit message if so.

# In GitHub Actions
- name: Check for hook bypasses
  run: |
    git log --oneline origin/main..HEAD | while read line; do
      hash=$(echo $line | cut -d' ' -f1)
      msg=$(git log --format=%B -n 1 $hash)
      if git log --format=%B -n 1 $hash | grep -q "no-verify bypass"; then
        echo "Documented bypass found in $hash"
      fi
    done

Enter fullscreen mode Exit fullscreen mode

The goal is to make --no-verify traceable, not to make it impossible. A developer in a genuine emergency who needs to commit right now and deal with the secret later should be able to do that — but there should be a record of the decision.


Measuring Hook Effectiveness

After the hook has been running for a few weeks, three metrics tell you whether it's working:

Bypass rate. What percentage of commits use --no-verify? A bypass rate above 10% suggests the hook is generating too many false positives or too much friction. Investigate which developers are bypassing most frequently and why.

Suppression rate. What percentage of findings are suppressed rather than fixed? High suppression rates indicate either noisy rules or developers treating suppression as the default response. Review suppressions in code review and push back on suppression-without-justification.

Secrets found in CI despite the hook. If your CI pipeline also runs a secrets scan and finds things the pre-commit hook didn't catch, those are false negatives worth understanding. Each one is an opportunity to improve the hook's coverage.

The hook is not a complete solution — it's the first line of defence. CI scanning is the second. Periodic full history scanning is the third. Each layer catches what the previous one misses.


The Broader Point: Shift Left Has a UX Requirement

"Shift left" — catching security issues earlier in the development lifecycle — is the right strategy. Every study on the economics of security defects confirms that earlier detection means lower remediation cost.

But shift left only works if the shifted controls are actually used. A pre-commit hook that developers disable after the first false positive has shifted nothing. A CI gate that gets bypassed in every release has shifted nothing.

The investment in UX — the careful output format, the clear suppression path, the fast scan, the explainable findings — is not cosmetic. It's what determines whether the security control actually operates or sits dormant in the repository while credentials quietly accumulate in git history.

Security controls that developers trust are security controls that get used. That's the only metric that matters.


The pre-commit hook implementation is in hooks/pre-commit at github.com/pgmpofu/secrets-detector.

Last article in the series: I ran the secrets detector against my own repositories — here's what it actually found, the false positives I encountered, and what the real-world numbers looked like.