





















Abstract:We present a dataset of adversarial malware samples derived from the public RawMal-TF collection of real-world malware binaries. Using a suite of adversarial malware generators, we construct two sets of adversarial PE files: 44,347 family-labelled samples and 33,596 type-labelled samples, achieving evasion rates of 98.35 % and 92.20 % against the EMBER classifier, respectively. Each adversarial binary is accompanied by detailed metadata, including EMBER scores and VirusTotal classifications. We further demonstrate the susceptibility of malware classification pipelines to data poisoning attacks through a series of training experiments. Injecting fully mislabelled adversarial samples representing only 0.5 % of the training data in the family-labelled dataset increases the evasion rate against the re-trained classifier from 26.1 % to 92.8 %. The dataset is publicly released to facilitate future research on adversarial malware, poisoning attacks, and the robustness of machine-learning-based malware detection systems.
| Subjects: | Cryptography and Security (cs.CR); Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.25937 [cs.CR] |
| (or arXiv:2605.25937v1 [cs.CR] for this version) | |
| https://doi.org/10.48550/arXiv.2605.25937 arXiv-issued DOI via DataCite (pending registration) |
From: Martin Jureček [view email]
[v1]
Mon, 25 May 2026 15:17:02 UTC (868 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。