惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs.AI updates on arXiv.org

Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation TGFormer: Towards Temporal Graph Transformer with Auto-Correlation Mechanism Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning Mimir: Large-scale Multilingual Concept Modeling STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction Beyond the Aggregation Dilemma: Prior-Retaining Decoupled Learning for Multimodal Graphs A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood? Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs Rethinking Federated Unlearning via the Lens of Memorization Language Bias in LVLMs: From In-Depth Analysis to Simple and Effective Mitigation Who judges the judges? Governance from metrics: a runtime framework for continuous LLM compliance monitoring Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks Factorize to Generalize: Retrieval-Guided Invariant-Dynamic Decomposition for Time Series Forecasting Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions Riemannian-Manifold Steering: Geometry-Aware Generative Autoencoders for Label-Free Steering Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training OSDTW: Optimal Shared Depth and Task Weighting for Long-Tailed Recognition In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models Assessing the Operational Viability of Foundation Models for Time Series Forecasting Generative OOD-regularized Model-based Policy Optimization Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization When Reasoning Hurts: Source-Aware Evaluation of Frontier LLMs for Clinical SOAP Note Generation Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis Towards a Universal Causal Reasoner Raon-Speech Technical Report Quaternion Self-Attention with Shared Scores LAPLEX: The FFT of Learnable Laplace Kernels Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection A Controlled Synthetic Benchmark for Educational Aspect-Based Sentiment Analysis Leveraging Gauge Freedom for Learning Non-Gradient Population Dynamics of Stochastic Systems The Concept Allocation Zone: Tracking How Concepts Form Across Transformer Depth Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning Measuring the Depth of LLM Unlearning via Activation Patching Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models Feature Lottery? A Bifurcation Theory of Concept Emergence ASTRO: Adaptive Spatio-Temporal Reinforcement Optimization for GNN Powered Anomly Detection in Cyber Physical Systems Balancing Fairness, Privacy, and Accuracy: A Multitask Adversarial Framework for Centralized Data-Driven Systems Extracting Training Data from Diffusion Language Models via Infilling CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale An Interactive Paradigm for Deep Research Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Side-by-side Comparison Amplifies Dialect Bias in Language Models Momentum Streams for Optimizer-Inspired Transformers Mixture of Complementary Agents for Robust LLM Ensemble Treatment Effect Estimation with Differentiated Networked Effect on Graph Data JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment LLM Agent Based Renewable Energy Forecasting Using Edge and IoT Data A Review of Solar Wind Weather and Grid Aware Decision Support Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette By Their Fruits You Will Know Them: Comparing Formalizations of Law by the Decisions They Encode READER: Reasoning-Enhanced AI-Generated Text Detection SEP-Attack: A Simple and Effective Paradigm for Transfer-Based Textual Adversarial Attack On the Stability and Realizability of Recurrent Polynomial Surrogate Ternary Logic Gate Networks SomaliBench Eval: Measuring English-to-Somali Refusal Gaps in Open-Weight Language Models Hidden-State Privacy Has an Empty Middle IndexMem: Learned KV-Cache Eviction with Latent Memory for Long-Context LLM Inference Disentangled Double Machine Learning for Accurate Causal Effect Estimation Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m Tiny Brains, Giant Impact: Uncovering the Keystone Neurons of LLM with Just a Few Prompts Beyond Generative Priors: Minority Sampling with JEPA-Guided Diffusion TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling On the Impact of Class Imbalance on the Learning Dynamics of Deep Neural Networks:An Intuitive Insight Explainable Retinal Imaging for Prediction of Multi-Organ Dysfunction in Type 2 Diabetes RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges Catching The Correct Answer Trap: Characterising AI Tutor Blind Spots When Analysing Student Reasoning Not All Transitions Matter: Evidence from PPO Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion GL-LFGNN:A Global-Local Dual-branch Causal Graph Neural Network Based on Liang-Kleeman Information Flow for EEG Emotion Recognition Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing Metropolis-Scale Resilient and Trustworthy Traffic Flow Inference Using Multi-Source Data PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection World-State Transformations for Neuro-symbolic Interactive Storytelling Investigating the Interplay between Contextual and Parametric Chain-of-Thought Faithfulness under Optimization TS-Skill: A Benchmark for Evaluating Analytical Skills in Time-Series Question Answering EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs A general tensor-structured compression scheme for efficient large language models Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions Knowledge Graph-Driven Expert-Level Reasoning for Neuroscience Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs Courant: a State-Adaptive Perceiver-Based Neural Surrogate with Local Support and Interpretable Field Decomposition Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation Selective Test-Time Compute Scaling for Click-Through Rate Prediction via Uncertainty-Triggered Feature Path Exploration AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches Batch Normalization Amplifies Memorization and Privacy Risks Generative AI impacts on intra-urban inequality and skill premium in Beijing A Multi-Agent LLM Framework for Rating the Quality of Surgical Feedback
Specification-Based Code-Text-Code Reengineering for LLM-Mediated Software Evolution
Oleg Grynets · 2026-05-26 · via cs.AI updates on arXiv.org

View PDF

Abstract:Direct Code2Code transformation remains challenging to control because it can preserve surface-level syntax while introducing semantic drift, hidden behavioral changes, loss of traceability, non-idiomatic target implementations, or incomplete reconstruction of domain logic. This paper proposes a specification-based Code2Text2Code reengineering framework for LLM-mediated software evolution. The central idea is to transform source code into a neutral textual specification that captures program behavior, identifiers, computational flow, conditions, side effects, data dependencies, and domain-specific intent without directly transferring the source language syntax. The proposed framework combines factual context extraction, Code2Text generation, iterative verification between source code and text specification, Text2Code generation, target code verification, retrieval-augmented grounding, and semantic-aware chunking, and transformation loss estimation. The knowledge representation layer integrates metadata derived from AST, graph-based dependency structures, neutral natural language specifications, technical documentation, business documentation, and architecture-level representations. The conducted experiments include a Code2Text2Code dataset built from multiple programming languages and SQL dialects, comparison of intermediate representations, retrieval evaluation, documentation transformation evaluation, and prompt tuning using DSPy. A graph formalization using structural preservation, reverse compatibility, interface stability, and total graph similarity is implemented to estimate transformation losses. The results support the interpretation of the Code2Text2Code approach not as a simple code transformation, but as a controlled specification-based reengineering process for LLM-mediated software evolution.
Comments: 15 pages, 9 figures, 7 tables, 39 references
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
MSC classes: 68T07 (Primary), 03G99 (Secondary)
ACM classes: C.5; D.2; E.4; F.2; F.3; F.4; H.1; H.4; H.5; J.1
Cite as: arXiv:2605.25232 [cs.SE]
  (or arXiv:2605.25232v1 [cs.SE] for this version)
  https://doi.org/10.48550/arXiv.2605.25232

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Vasyl Lyashkevych Yaremovych [view email]
[v1] Sun, 24 May 2026 19:36:04 UTC (2,487 KB)