惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs.AI updates on arXiv.org

Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation Diff-Instruct with Diffused Reward: Towards Principled One-step Generator RL Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching Confidence Calibration in Large Language Models A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood? BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs Fundamental Limitation in Explaining AI LAPLEX: The FFT of Learnable Laplace Kernels Treatment Effect Estimation with Differentiated Networked Effect on Graph Data Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems High-Risk AI Systems and the Problem of Identity in the European AI Act Feature Lottery? A Bifurcation Theory of Concept Emergence LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security Batch Normalization Amplifies Memorization and Privacy Risks SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers Assessing the Operational Viability of Foundation Models for Time Series Forecasting Balancing Fairness, Privacy, and Accuracy: A Multitask Adversarial Framework for Centralized Data-Driven Systems Raon-Speech Technical Report Multimodal Alignment and Preference Optimization for Zero-Shot Conditional RNA Generation Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research AvalancheBench: Evaluating Enterprise Data Agents Through Latent World Recovery DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning An Interpretable CF-RL-TOPSIS Fusion Model for Skills-Aware Talent Recommendation Momentum Streams for Optimizer-Inspired Transformers Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection Not All Transitions Matter: Evidence from PPO Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning Enhancing Reliability in LLM-Based Secure Code Generation Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models Extracting Training Data from Diffusion Language Models via Infilling Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m An Interactive Paradigm for Deep Research Learning to Reason Efficiently with A* Post-Training Exploration of Perceptual Speech Features for Clinical Decision-Support in Mental Health Care Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Side-by-side Comparison Amplifies Dialect Bias in Language Models TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs Mixture of Complementary Agents for Robust LLM Ensemble Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition Catching The Correct Answer Trap: Characterising AI Tutor Blind Spots When Analysing Student Reasoning Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform Hypothesis Generation and Inductive Inference in Children and Language Models Rethinking Federated Unlearning via the Lens of Memorization MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning Hidden-State Privacy Has an Empty Middle CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications Clustering as Reasoning: A $k$-Means Interpretation of Chain-of-Thought Graph Learning Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors Second Guess: Detecting Uncertainty Through Abstention and Answer Stability in Small Language Models TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling Nano World Models: A Minimalist Implementation of Future Video Prediction An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection Distributionally Robust Transfer Learning with Structurally Missing Covariates, with Application to Cross-National Cardiac Arrest Prediction Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis Inference Time Context Sparsity: Illusion or Opportunity? Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games Generative OOD-regularized Model-based Policy Optimization Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions Machine Psychometrics: A Mathematical Psychology of Artificial Intelligence Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models Catching MRI outliers: unsupervised detection and localization of MRI artefacts and clinical anomalies using deep learning Privacy-Preserving Local Language Models for Longitudinal Data Retrieval in Chronic Dermatologic Disease: Implementation in Pemphigus Patients
Harnessing AtomisticSkills for Agentic Atomistic Research
Bowen Deng, · 2026-05-26 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:Computational materials science and chemistry span vast knowledge domains and fractured software ecosystems. Although large language models (LLMs) have demonstrated research capabilities, scaling monolithic agents to manage the rigor and complexity of atomistic research remains a challenge. Here, we introduce AtomisticSkills, an open-source harness framework that empowers general-purpose AI coding agents to conduct atomistic research across materials science, chemistry, and drug discovery. By hierarchically decomposing scientific workflows into agent skills and tools, AtomisticSkills provides agents with modular, extensible, and plug-and-play research capabilities. The framework integrates more than 100 human-curated multidisciplinary skills, including database access, thermodynamics and kinetics modeling, and diverse simulation engines employing machine learning interatomic potentials (MLIPs) and density functional theory (DFT). We validate its functional coverage against scientific literature and demonstrate robust orchestration capabilities across diverse scientific campaigns: generative design of Li-ion solid-state electrolytes, high-throughput screening of metal-organic frameworks for CO2 capture, autonomous MLIP benchmarking and fine-tuning, multi-stage structure-based virtual screening for drug design, multimodal X-ray diffraction pattern analysis, and screening of Fe-oxide catalysts for oxygen evolution reaction. AtomisticSkills provides a critical agent infrastructure towards building fully autonomous AI scientists.
Subjects: Chemical Physics (physics.chem-ph); Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)
Cite as: arXiv:2605.24002 [physics.chem-ph]
  (or arXiv:2605.24002v1 [physics.chem-ph] for this version)
  https://doi.org/10.48550/arXiv.2605.24002

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Bowen Deng [view email]
[v1] Mon, 18 May 2026 21:45:36 UTC (7,943 KB)