惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

U
Unit 42
V
V2EX
Martin Fowler
Martin Fowler
博客园 - Franky
P
Proofpoint News Feed
P
Palo Alto Networks Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
B
Blog
The Register - Security
The Register - Security
Latest news
Latest news
S
Security @ Cisco Blogs
Simon Willison's Weblog
Simon Willison's Weblog
Recorded Future
Recorded Future
大猫的无限游戏
大猫的无限游戏
M
Microsoft Research Blog - Microsoft Research
Scott Helme
Scott Helme
T
Tailwind CSS Blog
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Application and Cybersecurity Blog
Application and Cybersecurity Blog
T
True Tiger Recordings
有赞技术团队
有赞技术团队
I
Intezer
Cisco Talos Blog
Cisco Talos Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
The GitHub Blog
The GitHub Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
T
Tenable Blog
博客园 - 叶小钗
Hugging Face - Blog
Hugging Face - Blog
Hacker News: Ask HN
Hacker News: Ask HN
S
Security Archives - TechRepublic
F
Future of Privacy Forum
爱范儿
爱范儿
PCI Perspectives
PCI Perspectives
H
Help Net Security
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
T
The Blog of Author Tim Ferriss
MyScale Blog
MyScale Blog
N
Netflix TechBlog - Medium
罗磊的独立博客
Apple Machine Learning Research
Apple Machine Learning Research
MongoDB | Blog
MongoDB | Blog
Security Latest
Security Latest
美团技术团队
博客园 - 三生石上(FineUI控件)
S
Schneier on Security
量子位
C
CERT Recently Published Vulnerability Notes
SecWiki News
SecWiki News

cs.LG updates on arXiv.org

Beyond Single Slot: Joint Optimization for Multi-Slot Guaranteed Display Advertising How Sparsity Allocation Shapes Label-Free Post-Pruning Recoverability On the Sample Complexity of Discounted Reinforcement Learning with Optimized Certainty Equivalents stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation Embedding-Based Federated Learning with Runtime Governance for Iron Deficiency Prediction The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity AgForce Enables Antigen-conditioned Generative Antibody Design Local Covariate Selection for Average Causal Effect Estimation without Pretreatment and Causal Sufficiency Assumptions Holomorphic Neural ODEs with Kolmogorov-Arnold Networks for Interpretable Discovery of Complex Dynamics Algebraic Machine Learning for Small-to-Medium Datasets Is Competitive against Strong Standard Baselines Temporal Contrastive Transformer for Financial Crime Detection: Self-Supervised Sequence Embeddings via Predictive Contrastive Coding Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos From Sequential Nodes to GPU Batches: Parallel Branch and Bound for Optimal $k$-Sparse GLMs Three Costs of Amortizing Gaussian Process Inference with Neural Processes What are the Right Symmetries for Formal Theorem Proving? Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws AutoMCU: Feasibility-First MCU Neural Network Customization via LLM-based Multi-Agent Systems Detecting Atypical Clients in Federated Learning via Representation-Level Divergence Symbolic Density Estimation for Discrete Distributions Graph neural network explanations reveal a topological signature of disease-associated hubs in biological networks A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction TBP-mHC: full expressivity for manifold-constrained hyper connections through transportation polytopes Representation Gap: Explaining the Unreasonable Effectiveness of Neural Networks from a Geometric Perspective Objective-Induced Bias and Search Dynamics in Multiobjective Unsupervised Feature Selection Physics-Informed Generative Solver: Bridging Data-Driven Priors and Conservation Laws for Stable Spatiotemporal Field Reconstruction Can Breath Biomarkers Causally Influence Blood Glucose? Investigating VOC-Mediated Modulation in Diabetes One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs IKNO: Infinite-order Kernel Neural Operators DualOptim+: Bridging Shared and Decoupled Optimizer States for Better Machine Unlearning in Large Language Models SCI-Defense: Defending Manipulation Attacks from Generative Engine Optimization Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability Dynamic Mixture of Latent Memories for Self-Evolving Agents Correcting Class Imbalance in Prior-Data Fitted Networks for Tabular Classification Models Can Model, But Can't Bind: Structured Grounding in Text-to-Optimization Machine learning prediction of obstructive coronary artery disease using opportunistic coronary calcium and epicardial fat assessments from CT calcium scoring scans OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning Leveraging Self-Paced Curriculum Learning for Enhanced Modality Balance in Multimodal Conversational Emotion Recognition ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data When to Switch, Not Just What: Transition Quality Prediction in Clash Royale Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning Ex-GraphRAG: Interpretable Evidence Routing for Graph-Augmented LLMs Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling SepsisAI Orchestrator: A Containerized and Scalable Platform for Deploying AI Models and Real-Time Monitoring in Early Sepsis Detection From Snapshots to Trajectories: Learning Single-Cell Gene Expression Dynamics via Conditional Flow Matching Manifold-Guided Attention Steering Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies Riemannian geometry meets fMRI: the advantages of modeling correlation manifolds and eigenvector subspaces LABO: LLM-Accelerated Bayesian Optimization through Broad Exploration and Selective Experimentation Toward Understanding Adversarial Distillation: Why Robust Teachers Fail Chebyshev Policies and the Mountain Car Problem: Reinforcement Learning for Low-Dimensional Control Tasks Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs One-Way Policy Optimization for Self-Evolving LLMs Provable Joint Decontamination for Benchmarking Multiple Large Language Models $\textit{BlockFormer}$ : Transformer-based inference from interaction maps Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics TONIC: Token-Centric Semantic Communication for Task-Oriented Wireless Systems Calibration, Uncertainty Communication, and Deployment Readiness in CKD Risk Prediction: A Framework Evaluation Study I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models Reinforced Graph of Thoughts: RL-Driven Adaptive Prompting for LLMs Protein Thoughts: Interpretable Reasoning with Tree of Thoughts and Embedding-Space Flow Matching for Protein-Protein Interaction Discovery When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning Equilibrium Propagation and Hamiltonian Inference in the Diffusive Fitzhugh-Nagumo Model Quantitative coronary calcification analysis for prediction of myocardial ischemia using non-contrast CT calcium scoring Alike Parts: A Feature-Informed Approach to Local and Global Prototype Explanations Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents PEARL: Unbiased Percentile Estimation via Contrastive Learning for Industrial-Scale Livestream Recommendation CASE-NET: Deep Spatio-Temporal Representation Learning via Causal Attention and Channel Recalibration for Multivariate Time Series Classification Bandit Convex Optimization with Gradient Prediction Adaptivity Position: The Time for Sampling Is Now! Charting a New Course for Bayesian Deep Learning On-Policy Consistency Training Improves LLM Safety with Minimal Capability Degradation Beyond Scalar Objectives: Expert-Feedback-Driven Autonomous Experimentation for Scientific Discovery at the Nanoscale Decomposing Ensemble Spread in Lorenz '96 With Learned Stochastic Parameterizations Optimal Guarantees for Auditing Rényi Differentially Private Machine Learning ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models MMD-Balls as Credal Sets: A PAC-Bayesian Framework for Epistemic Uncertainty in Test-Time Adaptation Prototype-Guided Classification Sub-Task Decoupling Framework: Enhancing Generalization and Interpretability for Multivariate Time Series Can Transformers Learn to Verify During Backtracking Search? ConTact: Contact-First Antibody CDR Design via Explicit Interface Reasoning Adaptive Measurement Allocation for Learning Kernelized SVMs Under Noisy Observations VeriScale: Adversarial Test-Suite Scaling for Verifiable Code Generation Discovering Entity-Conditioned Lag Heterogeneity: A Lag-Gated Neural Audit Framework for Panel Time Series No Epoch Like the Present: Robust Climate Emulation Requires Out-of-Distribution Generalisation Expectation Consistency Loss: Rethink Confidence Calibration under Covariate Shift Harnesses for Inference-Time Alignment over Execution Trajectories Visibility nowcasting in South Korea: a machine learning approach to class imbalance and distribution shift MetaDNS: Enhancing Exploration in Discrete Neural Samplers via Well-Tempered Metadynamics Predicting Performance of Symbolic and Prompt Programs with Examples An Improved Adaptive PID Optimizer with Enhanced Convergence and Stability for Deep Learning How Many Different Outputs Can a Transformer Generate? Noise Schedule Design for Diffusion Models: An Optimal Control Perspective Learning Causal Orderings for In-Context Tabular Prediction EmoTrack: Robust Depression Tracking from Counseling Transcripts across Session Regimes Tabular foundation models for robust calibration of near-infrared chemical sensing data Aerodynamic force reconstruction using physics-informed Gaussian processes The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation CausalGuard: Conformal Inference under Graph Uncertainty Provable Robustness against Backdoor Attacks via the Primal-Dual Perspective on Differential Privacy Double descent for least-squares interpolation on contaminated data: A simulation study Long-term Fairness with Selective Labels
ASSEMBLAGE-DEEPHISTORY: A Cross-Build Binary Dataset with Temporal Coverage
Chang Liu, N · 2026-05-23 · via cs.LG updates on arXiv.org

View PDF

Abstract:Existing binary corpora typically capture only one or two axes of binary variation: they either provide cross-compiler builds without a temporal axis, or CVE labels for single-build binaries. None combine cross-build diversity, cross-version history, and CVE labels into a queryable structure. We present ASSEMBLAGE-DEEPHISTORY, which consolidates these dimensions into a unified framework where every binary's compilation context, source code, vulnerable functions, and package version are stored as first-class metadata.
ASSEMBLAGE-DEEPHISTORY comprises 73,610 binaries spanning 248 open-source projects, compiled across GCC, Clang, and MSVC at multiple optimization levels on Linux and Windows, with multi-year historical builds. Each binary is indexed in a database that links it to its source code, functions, debug info, variant builds, historical versions, and vulnerable functions. Three analyses demonstrate this structure's value: (1) a three-stage LLM benchmark (recognition, strategy-guided detection, and cross-build transfer) to test whether LLMs reason about binary vulnerabilities or pattern-match on build-specific artifacts; (2) a comparison of MalConv embeddings, jTrans function embeddings, and TLSH fuzzy hashes quantifying how same-package versions cluster in each space; and (3) a Bayesian regression decomposing binary similarity into contributions from temporal distance, file changes, and commits.
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Software Engineering (cs.SE)
Cite as: arXiv:2605.21615 [cs.CR]
  (or arXiv:2605.21615v1 [cs.CR] for this version)
  https://doi.org/10.48550/arXiv.2605.21615

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Chang Liu [view email]
[v1] Wed, 20 May 2026 18:23:17 UTC (872 KB)