惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

爱范儿
爱范儿
Know Your Adversary
Know Your Adversary
Google DeepMind News
Google DeepMind News
A
Arctic Wolf
P
Privacy & Cybersecurity Law Blog
云风的 BLOG
云风的 BLOG
Stack Overflow Blog
Stack Overflow Blog
V
Visual Studio Blog
Project Zero
Project Zero
L
LangChain Blog
N
News and Events Feed by Topic
博客园 - Franky
Last Week in AI
Last Week in AI
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Scott Helme
Scott Helme
T
The Exploit Database - CXSecurity.com
P
Proofpoint News Feed
Blog — PlanetScale
Blog — PlanetScale
www.infosecurity-magazine.com
www.infosecurity-magazine.com
W
WeLiveSecurity
月光博客
月光博客
博客园_首页
美团技术团队
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
腾讯CDC
Latest news
Latest news
WordPress大学
WordPress大学
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Spread Privacy
Spread Privacy
Attack and Defense Labs
Attack and Defense Labs
量子位
L
LINUX DO - 热门话题
C
CERT Recently Published Vulnerability Notes
Webroot Blog
Webroot Blog
L
Lohrmann on Cybersecurity
aimingoo的专栏
aimingoo的专栏
T
Troy Hunt's Blog
Security Latest
Security Latest
小众软件
小众软件
Cloudbric
Cloudbric
Hacker News: Ask HN
Hacker News: Ask HN
S
Secure Thoughts
雷峰网
雷峰网
T
Threat Research - Cisco Blogs
H
Hacker News: Front Page
IT之家
IT之家
Simon Willison's Weblog
Simon Willison's Weblog

Sansec - experts in eCommerce security

GorgonAgora: 4,800+ fake storefronts skim cards across hundreds of impersonated brands Sansec adds support for Sylius 1 & 2 Critical vulnerability in Mirasvit Cache Warmer for Magento Critical FunnelKit vulnerability threatens 40,000+ WooCommerce checkouts Composer vulnerability leaks GitHub tokens, threatens PHP supply chain Over 200 PrestaShop stores expose installer, allowing full takeover ClickFix malware hits DoD cybersecurity vendor homepage SVG Onload Tag Hides Magecart Skimmer on 99 Stores Mass PolyShell attack wave hits 471 stores in one hour Novel WebRTC skimmer bypasses security controls at $100+ billion car maker PolyShell: unrestricted file upload in Magento and Adobe Commerce Digital skimmer hits global supermarket chain Magento Developers Impersonated in Targeted GitHub Malware Operation Claude finds 353 zero-days on Packagist The billion-dollar security.txt problem Keylogger targets 200,000+ employees at major US bank ConnectPOS leaked Github secrets for years Critical backdoor found in MGT Varnish extension SessionReaper attacks have started, 3 in 5 stores still vulnerable SessionReaper, unauthenticated RCE in Magento & Adobe Commerce (CVE-2025-54236) Adobe patches critical Magento admin takeover via menu injection Backdoor found in popular ecommerce components Found defunct.dat on your site? You've got a problem. You have 2 weeks left to set up CSP for your store Merchants left guessing at last-minute PCI-DSS u-turn Magento Security Release APSB25-08 [Impact Analysis] Sorry, client-side security does not work Google services abused in skimming campaigns Thousands of Adobe Commerce stores hacked in competing CosmicSting campaigns CosmicSting attack & defense overview Persistent backdoors injected on Adobe Commerce via new CosmicSting attack CosmicSting attacks have started hitting major stores Polyfill supply chain attack hits 100K+ sites CosmicSting attack threatens 75% of Adobe Commerce stores Persistent Magento backdoor hidden in XML Sansec joins forces with Google's VirusTotal Sansec and Europol counter online skimming Magento wish list exploit bypasses WAF protection Is your store’s newsletter being used for phishing? Malware Persistence via Telegram and GitHub Postponed Exfiltration Evades Detection Sansec analysis: 12% of online stores leak private backups Vendors defeat Magento security patch (+ simple check) Fake Klaviyo accounts added to Magento Adobe Commerce merchants to be hit with TrojanOrders this season Extortion of Magento merchants Surge in Magento 2 template attacks Magento vendor Fishpig hacked, backdoors added Magento 2 critical vulnerability (CVE-2022-24086 & CVE-2022-24087) NaturalFreshMall: a Magento Mass Hack Magento and the Log4j vulnerability NginRAT parasite targets Nginx CronRAT malware hides behind February 31st New linux_avp malware hits eCommerce sites Case Study: How eCommerce Hackers Silently Steal Credit Card Data Google Apps Script used to steal data Fake payment page before checkout on Shopify and BigCommerce eCommerce trojan accidentally leaks victims Hackers exploit security flaw right before Black Friday Payment skimmer hides in social media buttons Cardbleed: 3% of Magento install base hacked North Korean hackers are skimming US and European shoppers Digital skimmer runs entirely on Google, defeats CSP Lockdown: Stores closed, online stores hacked Do these two things to keep your Magento 1 store running after June Magento 1 still PCI compliant after 1 July 2020? Sansec reveals longest Magecart skimming operation to date [Analysis] Maxcluster and Sansec partner to secure German stores Indonesian Magecart hackers arrested Payment skimmers have impersonated Sansec American Cancer Society hit by payment skimmer Magento security extentions vendor got hacked FBI recommends eCommerce malware protection Sansec at Europol training: 50,000+ stores hacked PCI-SSC/RHISAC quote Sansec: 20% stores reinfected Critical Magento 2 flaw exploited within 16 hours 57 payment gateways from Germany to Brazil targeted Sports brand Puma infected with advanced malware Credit cards of Atlanta Hawks fans stolen Bad extensions now main source of Magento hacks: a solution! Large sites hacked via Adminer database tool PHP tool 'Adminer' leaks passwords Competing digital skimmers sabotage each other Merchants struggle with MageCart reinfections Backdoor found in Webgility Unpublished security flaws (0days) massively exploited German political party store hacked before election MageCart: now with tripwire ABS-CBN next in series of high profile breaches Is your Google Analytics code malicious? MagentoCore group hacks 7,339 stores and counting Hackers breached Magento through helpdesk Cryptojacking found on 2496 online stores Why ordering HTTP headers is important Warning: fake Magento patch 9789 contains virus A Magento breach analysis: part 1 An OpenCart/Magento hacking dashboard Self-healing malware restores itself after deletion Visbot malware found on 6691 stores [analysis] Criminals have rewired 3,500 online stores
Building a faster YARA engine in pure Go
Sansec Forensics Team · 2026-02-18 · via Sansec - experts in eCommerce security

eComscan scan times before and after Yargo

eComscan scan times before and after Yargo

YARA is the industry standard for pattern matching in malware detection. Maintained by VirusTotal, it powers threat detection at nearly every security vendor. At Sansec, we rely on YARA for eComscan and our global threat monitor, scanning hundreds of thousands of stores daily.

But YARA was primarily designed for binary malware analysis, and wrapping its C library in Go was painful. We scan text files: PHP, JavaScript, HTML templates. So we built Yargo, a pure Go YARA engine optimized for source code.

How YARA works

A YARA rule defines string patterns and a condition that determines when the rule matches:

rule php_backdoor {
    strings:
        $assert = "assert"
        $serialize = "serialize"
        $session = "session"
    condition:
        all of them
}

Under the hood, YARA extracts short byte sequences ("atoms") from each pattern and loads them into an Aho-Corasick automaton, a state machine that can match thousands of patterns in a single pass over the input. When an atom matches, YARA verifies the full pattern at that position.

The structure starts as a trie: a tree where each edge represents one byte, and paths from root to node spell out the patterns. To search, you walk the trie byte by byte through the input.

YARA extracts 4-byte atoms from each pattern and loads them into an Aho-Corasick automaton

Aho-Corasick automaton for the 4-byte atoms asse, seri, and sess. Teal nodes are match states, dashed arrows are failure links.

The failure links (dashed arrows) are what make it fast: when a path doesn't match, instead of restarting from the root, the automaton jumps to the longest suffix that is also a prefix of another pattern. This means no byte is ever read twice.

This two-phase approach (cheap pre-filter, expensive verification) is what makes YARA fast enough to scan thousands of files against thousands of rules.

Low-hanging fruit

Before deciding to build a new engine, we first looked at optimizing our malware signature database.

We maintain a large list of burner domains (disposable domains used by attackers to host malware or exfiltrate sensitive data), currently over 12,000 entries. These were originally written using word boundary regexes (\b[domain]\b) instead of YARA's more efficient fullword modifier.

Switching to fullword string matches eliminated thousands of expensive regex verifications per scan. The performance improvement was significant, but with the signatures optimized, the bottleneck shifted to the engine itself.

Outgrowing go-yara

We use Go for most of our projects, so we relied on go-yara, the C bindings for libyara, for years.

The biggest pain point was CGo. It requires a C compiler, pkg-config, and a pre-installed libyara. Cross-compilation is tedious enough that the go-yara docs need a dedicated guide just to explain it. CGo also prevents fully static binaries, one of Go's biggest advantages. We even maintained an ancient build server just to keep compatibility with merchants running older kernels.

On top of that, YARA's internals are optimized for binary malware analysis, not source code scanning.

Introducing Yargo

Yargo is a pure Go implementation of the YARA features we actually need. It follows the same architecture:

  1. Parse: goyacc-based LALR(1) parser turns YARA rules into an AST
  2. Compile: extract atoms from patterns, build the AC automaton
  3. Scan: AC pre-filter, regex verification, condition evaluation

The key changes target how Aho-Corasick handles text files.

Full string literals in the automaton

YARA truncates every atom to 4 bytes. A 19-byte pattern like eval(base64_decode( gets reduced to a single 4-byte substring, whichever scores highest. But any 4-byte sequence will match far more often than the full string.

Yargo puts entire string literals into the AC automaton, so that same pattern enters as all 19 bytes and only matches when the actual obfuscated code is present. This comes at the cost of a larger automaton, but the extra memory usage is in the order of megabytes.

Smarter regex atoms

YARA can use regex atoms as short as 1 byte. Depending on the rules and the input, this can lead to a lot of unnecessary verifications.

Yargo requires a minimum atom length of 3 bytes. With 256^3 (16.7 million) possible values, the chance of a false atom match drops dramatically. We had to adjust a small number of signatures to accommodate this, but the performance gain was significant.

Scoring atoms for source code

YARA scores atom quality generically: common bytes like 0x00 and 0x20 get penalized.

Yargo's quality function is tuned for web source code:

  • Common PHP/JS tokens (return, function, var, ();) are banned from atom selection entirely
  • Alphabetic bytes (very common in source code) score lower than non-ASCII bytes
  • The heuristic picks atoms that discriminate well in the kind of files we actually scan

Tuned byte frequencies

The AC pre-filter uses byte frequency data generated from real ecommerce stores, so the scanner knows which bytes are actually rare and can skip ahead more effectively.

Performance

eComscan runs over 57,000 scans per day. Yargo has been processing all of them since early February with no regressions.

The signature optimization alone cut median scan times nearly in half. Deploying Yargo cut them by another 6.8x: average scan time went from 12.5 minutes to under 2 minutes, and median scans now complete in under 1 minute.

In its first two weeks of production, Yargo has saved over 116,000 CPU-hours compared to the old engine.

Future work

Yargo currently implements the subset of YARA that we need for our use cases. We're considering making it more backward compatible with YARA, so it can serve as a drop-in replacement for go-yara in other projects.

Yargo is available at github.com/sansecio/yargo under the MIT license.

Read more