惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs.AI updates on arXiv.org

Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning Beyond Generative Priors: Minority Sampling with JEPA-Guided Diffusion Batch Normalization Amplifies Memorization and Privacy Risks Treatment Effect Estimation with Differentiated Networked Effect on Graph Data Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference Spectral Retrieval: Multi-Scale Sinc Convolution over Token Embeddings for Localized Retrieval in LLM Multi-Agent Systems TGFormer: Towards Temporal Graph Transformer with Auto-Correlation Mechanism A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood? Leveraging Gauge Freedom for Learning Non-Gradient Population Dynamics of Stochastic Systems LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning Language Bias in LVLMs: From In-Depth Analysis to Simple and Effective Mitigation Who judges the judges? Governance from metrics: a runtime framework for continuous LLM compliance monitoring Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction Selective Test-Time Compute Scaling for Click-Through Rate Prediction via Uncertainty-Triggered Feature Path Exploration Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions Machine Psychometrics: A Mathematical Psychology of Artificial Intelligence Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers Tiny Brains, Giant Impact: Uncovering the Keystone Neurons of LLM with Just a Few Prompts Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework Momentum Streams for Optimizer-Inspired Transformers Rethinking Federated Unlearning via the Lens of Memorization Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization When Reasoning Hurts: Source-Aware Evaluation of Frontier LLMs for Clinical SOAP Note Generation Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis Towards a Universal Causal Reasoner Raon-Speech Technical Report Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning Explainable Retinal Imaging for Prediction of Multi-Organ Dysfunction in Type 2 Diabetes Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection OSDTW: Optimal Shared Depth and Task Weighting for Long-Tailed Recognition Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning Measuring the Depth of LLM Unlearning via Activation Patching Exploration of Perceptual Speech Features for Clinical Decision-Support in Mental Health Care The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models Feature Lottery? A Bifurcation Theory of Concept Emergence Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models Beyond the Aggregation Dilemma: Prior-Retaining Decoupled Learning for Multimodal Graphs Extracting Training Data from Diffusion Language Models via Infilling Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale An Interactive Paradigm for Deep Research Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning Disentangled Double Machine Learning for Accurate Causal Effect Estimation Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Side-by-side Comparison Amplifies Dialect Bias in Language Models LAPLEX: The FFT of Learnable Laplace Kernels Mixture of Complementary Agents for Robust LLM Ensemble Generative OOD-regularized Model-based Policy Optimization PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training Assessing the Operational Viability of Foundation Models for Time Series Forecasting Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors On the Stability and Realizability of Recurrent Polynomial Surrogate Ternary Logic Gate Networks SEP-Attack: A Simple and Effective Paradigm for Transfer-Based Textual Adversarial Attack Riemannian-Manifold Steering: Geometry-Aware Generative Autoencoders for Label-Free Steering The Concept Allocation Zone: Tracking How Concepts Form Across Transformer Depth Hidden-State Privacy Has an Empty Middle RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges GL-LFGNN:A Global-Local Dual-branch Causal Graph Neural Network Based on Liang-Kleeman Information Flow for EEG Emotion Recognition On the Impact of Class Imbalance on the Learning Dynamics of Deep Neural Networks:An Intuitive Insight Courant: a State-Adaptive Perceiver-Based Neural Surrogate with Local Support and Interpretable Field Decomposition Quaternion Self-Attention with Shared Scores TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling Catching The Correct Answer Trap: Characterising AI Tutor Blind Spots When Analysing Student Reasoning When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform Fundamental Limitation in Explaining AI Not All Transitions Matter: Evidence from PPO GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion Hypothesis Generation and Inductive Inference in Children and Language Models Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection World-State Transformations for Neuro-symbolic Interactive Storytelling Investigating the Interplay between Contextual and Parametric Chain-of-Thought Faithfulness under Optimization TS-Skill: A Benchmark for Evaluating Analytical Skills in Time-Series Question Answering EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs Complement Submodular Information Measures for Balanced and Robust Data Selection Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions Balancing Fairness, Privacy, and Accuracy: A Multitask Adversarial Framework for Centralized Data-Driven Systems Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs Learning to Reason Efficiently with A* Post-Training Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m Cultivating Machine Intelligence: The OMEGA Shift from Top-Down Optimization to Autopoietic Cognitive Ecologies Factorize to Generalize: Retrieval-Guided Invariant-Dynamic Decomposition for Time Series Forecasting
Multi-Agent Specification-based Metamorphic Testing of FMU-Based Simulations
Ashir Kulshr · 2026-05-26 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:In many industrial domains, the Functional Mock-up Interface (FMI) is used to exchange simulation models as Functional Mock-up Units (FMUs) across different partners using various modelling tools. This opens up the possibilities for simulation-based verification and validation using FMUs for ensuring reliable system behaviour. However, deriving effective test oracles for these simulation models remains challenging due to the absence of explicit expected outputs. This limits the applicability of conventional testing approaches, which require access to the internal workings of the systems. Metamorphic testing (MT) addresses this limitation by leveraging metamorphic relations (MRs), but extracting such relations from specifications remains largely a manual and error-prone process. To address this challenge, we propose an LLM-powered multi-agent workflow for specification-based metamorphic testing of FMU-based simulation models. The approach takes functional and interface specifications as input and orchestrates multiple agents to extract requirements and derive MRs. These MRs are expressed using Given-When-Then patterns to structure input conditions (Given), transformations (When), and expected output behaviours (Then). These relations are then used to generate metamorphic test cases, execute simulations, and evaluate output consistency across multiple sessions. We evaluate the approach on a Lube Oil Cooling system FMU, demonstrating its ability to automatically generate meaningful MRs and corresponding test cases. Preliminary results indicate that the proposed workflow can effectively support the systematic verification and validation of dynamic simulation models by reducing manual effort and improving test generation.
Comments: Author version. 9 pages. Accepted for publication in the 10th International Workshop on Metamorphic Testing (MET 2026) of the IEEE Conference on Computers, Software, and Applications (COMPSAC2026), June 7-10, 2026 Madrid, Spain
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Cite as: arXiv:2605.25101 [cs.SE]
  (or arXiv:2605.25101v1 [cs.SE] for this version)
  https://doi.org/10.48550/arXiv.2605.25101

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Dragos Truscan [view email]
[v1] Sun, 24 May 2026 14:30:56 UTC (6,946 KB)