惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Blog — PlanetScale
Blog — PlanetScale
D
Docker
Microsoft Security Blog
Microsoft Security Blog
E
Exploit-DB.com RSS Feed
N
News and Events Feed by Topic
H
Heimdal Security Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
W
WeLiveSecurity
Hacker News - Newest:
Hacker News - Newest: "LLM"
博客园 - 叶小钗
Forbes - Security
Forbes - Security
Webroot Blog
Webroot Blog
O
OpenAI News
人人都是产品经理
人人都是产品经理
S
Securelist
B
Blog
博客园_首页
L
LINUX DO - 最新话题
C
Cybersecurity and Infrastructure Security Agency CISA
博客园 - 司徒正美
WordPress大学
WordPress大学
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Jina AI
Jina AI
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Simon Willison's Weblog
Simon Willison's Weblog
The GitHub Blog
The GitHub Blog
T
Threatpost
T
Threat Research - Cisco Blogs
Apple Machine Learning Research
Apple Machine Learning Research
Attack and Defense Labs
Attack and Defense Labs
L
Lohrmann on Cybersecurity
T
The Exploit Database - CXSecurity.com
S
Security Archives - TechRepublic
Hacker News: Ask HN
Hacker News: Ask HN
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
罗磊的独立博客
TaoSecurity Blog
TaoSecurity Blog
A
Arctic Wolf
Google Online Security Blog
Google Online Security Blog
AWS News Blog
AWS News Blog
T
Tailwind CSS Blog
N
News and Events Feed by Topic
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
云风的 BLOG
云风的 BLOG
博客园 - Franky
D
DataBreaches.Net
Application and Cybersecurity Blog
Application and Cybersecurity Blog
IT之家
IT之家
The Hacker News
The Hacker News
aimingoo的专栏
aimingoo的专栏

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Building a Multi-Source Threat Intelligence Correlation Engine in Python
platinum2high · 2026-06-14 · via DEV Community

A SOC analyst's notes on going from "I want to learn async" to a working tool that other analysts can clone and use.


TL;DR

I'm a SOC analyst learning Python and built IOC Hunter — an async tool that takes a chunk of text (phishing report, log dump, Slack export), extracts every indicator inside, queries six threat-intel sources in parallel, and produces a verdict you can drop into a ticket or a SIEM.

This article is the why and the how — the architectural decisions I had to think through, the things that bit me, and a small dose of "what I learned about myself as an engineer."

GitHub logo platinum2high / ioc-hunter

Async threat intelligence correlation engine. Auto-parses IOCs from raw text, enriches them across 6 TI feeds in parallel, exports STIX/MISP/Sigma/Suricata. Works keyless out of the box.


The Problem

I sit in a SOC. The shape of my day is: alert fires → triage → mostly boring → occasionally interesting → write a ticket.

The "occasionally interesting" part is where I noticed the same workflow repeating. Someone forwards me a phishing email. The body has IPs, URLs, hashes, an email address. Half of them are defanged (evil[.]com, hxxps://, bad[at]evil[.]com). Some are encoded — base64 in the headers, hex in the payload.

To triage, I do roughly this:

  1. Refang each indicator by hand
  2. Open VirusTotal, paste
  3. Open AbuseIPDB, paste
  4. Open URLhaus, paste
  5. Mentally aggregate "VT says X, AbuseIPDB says Y, URLhaus has it as Z"
  6. Decide
  7. Write the ticket, paraphrasing the sources

This is a 30-minute manual process for what should be 30 seconds. And most existing IOC checkers I found on GitHub were 1:1: one IOC in, one source out. They didn't solve the workflow problem — they just slightly automated step 2.

So I wrote one that solves the whole thing.


What it actually does

$ ioc-hunter check "185[.]220[.]101[.]42"

Output (simplified):

╭─────── IOC Hunter ────────╮
│ 185[.]220[.]101[.]42      │
│ type: ipv4                │
│                           │
│ MALICIOUS  confidence 46% │
╰───────────────────────────╯

Source       Verdict      Score   Notes
─────────────────────────────────────────────
tor_exit     SUSPICIOUS    0.50   tor, anonymizer
abuseipdb    MALICIOUS     1.00   country:DE, isp:Tor-Exit traffic
otx          MALICIOUS     1.00   Bruteforce, SSH, Honeypot
virustotal   MALICIOUS     0.15   suspicious-udp, tor
urlhaus      UNKNOWN       0.00
threatfox    UNKNOWN       0.00

Six sources, queried in parallel, defanged on input and on output (so you can paste the result into a chat without anyone clicking it), weighted verdict with the per-source contribution shown explicitly so you can defend the call in a ticket.

But the real feature is scan-file — drop in a 200-line incident report, get back every indicator inside, each enriched, sorted by confidence. And correlate finds the pivots: shared infrastructure, shared malware tags, URL-to-host relationships across the batch.


Architectural Decisions That Took Thought

1. The plugin pattern for sources

I want adding a new TI feed to be one file, no other changes anywhere.

class Source(ABC):
    name: str
    weight: float
    supported_types: frozenset[IOCType]
    requires_key: bool = False

    @abstractmethod
    async def lookup(self, ioc_type: IOCType, ioc_value: str) -> SourceResult:
        ...

Each source is a class with class-level metadata (weight, supported_types, requires_key) and one method. The orchestrator introspects the metadata to pick which sources to query for each IOC and to skip ones whose key isn't configured.

This means I can drop in a Shodan source tomorrow and not touch the engine, scorer, or CLI.

2. Graceful degradation > opinionated requirements

A naive design: "no API keys → tool doesn't work." A user-friendly design: every source short-circuits to UNKNOWN if its key is missing, with an explanatory error message; the rest run normally.

@property
def is_configured(self) -> bool:
    return not self.requires_key or bool(self._api_key)

The orchestrator skips unconfigured sources before they ever fire a request. So if you clone my repo and run it without registering for anything, you still get a verdict — just from the one truly-keyless source (Tor exit list). Five API keys unlock the rest.

This is the difference between "demo project" and "tool people actually try." Anyone cloning it sees output in 30 seconds.

3. Transparent weighted scoring, not a black box

Every verdict comes with the per-source contribution. The scoring formula is:

weighted: dict[Verdict, float] = dict.fromkeys(Verdict, 0.0)
for r in valid_results:
    w = sources_by_name[r.source].weight
    if r.verdict in {MALICIOUS, SUSPICIOUS}:
        weighted[r.verdict] += w * max(r.score, MIN_PRESENCE_SCORE)
    elif r.verdict is BENIGN:
        weighted[r.verdict] += w

Then severity-prioritized thresholds (malicious share ≥ 25% wins, etc.).

The whole function is 30 lines. An analyst can read it and reproduce the verdict on paper. That matters when defending a finding in an incident review.

4. Async concurrency with a global cap

class Engine:
    def __init__(self, sources, *, cache=None, max_concurrency=8):
        self._sem = asyncio.Semaphore(max_concurrency)

    async def _lookup_cached(self, ioc, source):
        if self._cache and (hit := self._cache.get(...)):
            return hit
        async with self._sem:
            return await source.lookup(...)

The semaphore is shared across all sources and all IOCs. So when the analyst feeds in 100 IOCs, the engine doesn't slam every source with 100 simultaneous requests — it pipelines them through the cap.

The free tiers of these APIs have rate limits (VirusTotal: 4 req/minute on free). Without the cap I'd hit 429s instantly.


Things That Bit Me

URLhaus and ThreatFox now require auth

Until mid-2024 they were truly keyless. The abuse.ch team added Auth-Key requirement to fight scraper abuse. The key is free and registration is instant, but my "everything-keyless" pitch had to become "Tor-keyless, everything else free signup."

This is fine, but it taught me to always link to the registration URL from the error message when a source short-circuits. Don't make the user dig.

VirusTotal URL IDs are not URLs

VT's v3 API expects URLs as urlsafe-base64(url) with padding stripped. I lost an hour to this before reading their docs carefully:

def _vt_url_id(url: str) -> str:
    return base64.urlsafe_b64encode(url.encode()).rstrip(b"=").decode()

Rich's markup parser eats [@]

I render defanged values in the CLI: bad@evil.combad[@]evil[.]com. Rich's table renderer interpreted [@] as a (nonexistent) markup tag and silently stripped it. Output became badevil[.]com — completely broken.

The fix is rich.markup.escape():

def _safe(value: str) -> str:
    return rich.markup.escape(defang(value))

I now wrap every IOC value in _safe() before passing to a Rich component. Tests caught this only after I started writing the README — the tests verified the verdict, not the rendered string.

STIX 2.1 patterns need apostrophe-escaping

A domain IOC with an apostrophe (it's.example.com — weird but possible) breaks the STIX pattern:

[domain-name:value = 'it's.example.com']  ← invalid
[domain-name:value = 'it\'s.example.com'] ← valid

Pattern values are single-quoted in STIX, so embedded apostrophes need escaping. Took a tracked-down-on-purpose test to catch it.


The Boring Parts That Matter

If you read GitHub-shaped engineering posts, the "boring parts" — tests, CI, lint, secret scanning, Docker hygiene — get one sentence at the end. They probably deserve half the post.

217 unit tests. Every regex pattern, every source, every exporter, every scorer threshold has a test. Network is mocked via respx. The test suite runs in 0.7 seconds. I can refactor anything and know within a second if I broke something.

CI matrix. Tests run on Python 3.11 and 3.12. Ruff lints and format-checks. Docker image builds. Gitleaks scans the diff for accidentally-committed secrets. Every PR has to pass all of this before merging.

Multi-stage Docker. The runtime image is non-root, ~120 MB, doesn't include test files or the wheel-builder layer. The cache directory is a mounted volume so it survives container restarts.

None of this is impressive on its own. It's the table stakes that separates "code I'd hire someone for" from "code I'd ask them to explain in an interview."


What I Learned About Myself

I started this thinking "I'll learn asyncio." I finished thinking "asyncio was the easy part — the hard part was deciding what not to build."

Half the work was saying no:

  • No PyYAML for Sigma generation. Hand-write the YAML, save a dependency.
  • No SQLAlchemy for the cache. Stdlib sqlite3 is enough.
  • No "agent framework" for plugin sources. An ABC and a list is enough.
  • No background daemon. A CLI is enough.
  • No web UI. The Rich TUI is enough.

Every "is enough" is a thing I didn't have to test, document, maintain, or explain to a hiring manager. The project is 6,000 lines of code and 4 runtime dependencies because of that discipline.

I think this is the real seniority signal. Anyone can add a dep. Not everyone can leave one out.


If You Want to Try It

git clone https://github.com/platinum2high/ioc-hunter
cd ioc-hunter
python -m venv .venv && source .venv/bin/activate
pip install -e .

ioc-hunter check "185[.]220[.]101[.]42"   # works keyless
ioc-hunter configure                       # walks through optional API keys
ioc-hunter scan-file examples/sample-incident.txt

Or with Docker:

cp .env.example .env
docker compose run --rm ioc-hunter check evil[.]com

The repo is MIT, the issue tracker is open, and I'd genuinely love feedback from SOC analysts on the scoring model, defang patterns, and sources I should add. (I'm thinking abuse.ch MalwareBazaar and GreyNoise next.)


Code: github.com/platinum2high/ioc-hunter

Reach me on LinkedIn if you want to chat about SOC tooling, threat intel, or detection engineering.