惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

MongoDB | Blog
MongoDB | Blog
IT之家
IT之家
J
Java Code Geeks
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Recent Announcements
Recent Announcements
博客园 - 三生石上(FineUI控件)
博客园_首页
MyScale Blog
MyScale Blog
腾讯CDC
I
InfoQ
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
人人都是产品经理
人人都是产品经理
Vercel News
Vercel News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
量子位
爱范儿
爱范儿
U
Unit 42
aimingoo的专栏
aimingoo的专栏
B
Blog RSS Feed
云风的 BLOG
云风的 BLOG
M
MIT News - Artificial intelligence
A
About on SuperTechFans
T
The Blog of Author Tim Ferriss
Blog — PlanetScale
Blog — PlanetScale
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Engineering at Meta
Engineering at Meta
博客园 - 叶小钗
小众软件
小众软件
Jina AI
Jina AI
Hugging Face - Blog
Hugging Face - Blog
Google DeepMind News
Google DeepMind News
The Cloudflare Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
D
Docker
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
博客园 - 【当耐特】
博客园 - Franky
H
Help Net Security
Stack Overflow Blog
Stack Overflow Blog
阮一峰的网络日志
阮一峰的网络日志
C
Check Point Blog
C
CERT Recently Published Vulnerability Notes
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Cisco Talos Blog
Cisco Talos Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
I
Intezer
Latest news
Latest news
D
Darknet – Hacking Tools, Hacker News & Cyber Security
博客园 - 司徒正美
Microsoft Security Blog
Microsoft Security Blog

cs.SE updates on arXiv.org

Your Agent Has a Genome: Sequence-Level Behavioral Analysis and Runtime Governance of LLM-Powered Autonomous Agents LLM-as-Code Agentic Programming for Agent Harness Looking Is Not Picking: An Attention-Segment Account of Tool-Selection Failures in LLM Agents Knowledge-Based Zero-Replay Debugging of Multi-Agent LLM Traces Beyond Correctness: Enhancing Architectural Reasoning in Code LLMs via Scalable Labeling with Agentic Judgment The Perils of Agency: How Developers Perceive, Prioritize, and Address Risks in Agentic AI Products Imperfect Visual Verification for Code Edition : A Case Study on TikZ Snyk VulnBench JS 1.0: Can LLMs Find the Same Bugs Twice? Graphical-Probabilistic Modeling of Generative Flows in LLM-Native Software Systems Green SARC: Predictive Cost and Carbon Governance for Agentic AI Systems Open-SWE-Traces: Advancing Dual-Mode Multilingual Distillation for Software Engineering Agents UXBench: Measuring the Actionability of LLM-Generated UX Critiques AI Supply Chain Galaxy: 3D Visual Analytics for License Compliance Beyond Models: Reflections on Engineering AI-enabled Systems in a Project-Based Course Selection Without Signal, Recovery Through Expression: A Measurement Study of Post-Hoc Falsification Operators for Frozen Small Code Models Pandas for Reproducible Data Analysis: From Spreadsheets to Research-Grade Python Workflows Specifications for Humans, Agents, and Tooling The Hitchhiker's Guide to Program Analysis, Part III: Mostly Harmless LLMs AI-driven Software Development: A Pragmatic Path to Agentic Development Processes A Scalability Analysis of Quantitative Confidence Assessment Methods for Assurance Cases SDVDiag: Multimodal Causal Discovery for Online Diagnosis in Software-defined Vehicles Minimal Comparison of Octagonal Abstract Domains Teaching testing seriously in academia Applications of Causality in Software Testing: A Rapid Review Bigger Isn't Always Better: A Comparative Evaluation of LLMs for Automated Code Review Agent trajectories as programs: fingerprinting and programming coding-agent behavior Schema-Agnostic Process Trace Construction: From Raw Tables to Execution Behavior Faster Code, Deeper Debt? A Multivocal Literature Review on Technical Debt and Its Early Signs in LLM-Assisted Software Development Configuration Smells in AGENTS.md Files: Common Mistakes in Configuring Coding Agents Binary Decompilation LLM with Feedback-Driven Multi-Turn Refinement Q-READY: Predictive Feasibility Assessment for Hybrid Quantum-Classical Applications Understanding Automated Web GUI Testing: An Empirical Study Across Exploration Strategies and State Abstractions Trust by design -- in praise of modularization: a case study Reference Architecture for Metadata-driven Services to Promote Reusability in Software Systems Organizational Cohesion in Microservice Architectures: A Multi-Project Empirical Study No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages Towards LLM Accelerated Rapid Reviews for Software Tool Discovery -- Case for Log Anomaly Detection Neuro-Symbolic Software Verification: Hyper-charging Local Language Models with Symbolic Reasoning at Scale How Many Shots Are Enough for a Quantum Circuit? Typed Component Algebras for Simulated Annealing and Markov-Chain Monte Carlo DynNPC: Finding More Violations Induced by ADS in Simulation Testing through Dynamic NPC Behavior Generation Towards Functional Correctness of Large Code Models with Selective Generation The Influence of Code Comments on the Perceived Helpfulness of Stack Overflow Posts Human-Centred Requirements Engineering for Critical Systems: Insights from Disaster Early Warning Applications "An Endless Stream of AI Slop": How Developers Discuss the Burden of AI-Assisted Software Development LATS-RCA: Language Agent Tree Search for Root Cause Analysis in Microservices Assistance to Autonomy: A Systematic Literature Review of Agentic AI across the Software Development Life Cycle ReSyn: A Generalized Recursive Regular Expression Synthesis Framework LLM Agents Can See Code Repositories AI-Driven Test Case Generation from Natural Language Requirements: A Survey of Techniques and Research Gaps Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency An Empirical Investigation of Pre-Trained Deep Learning Model Reuse in the Scientific Process AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion FasterPy: An LLM-based Code Execution Efficiency Optimization Framework DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation A Study of In-Context-Learning-Based Text-to-SQL Errors
ScratchLens: Lens-Parametric Behavioral Equivalence for Scratch Programs
[Submitted on 14 Jun 2026] · 2026-06-16 · via cs.SE updates on arXiv.org

View PDF HTML (experimental)

Abstract:Two Scratch programs can be syntactically far apart-renamed variables, split scripts, extracted custom blocks, or reordered initialization-and still behave identically; a one-block edit, such as replacing a blocking broadcast with an asynchronous one, can create divergences visible only under specific schedules. Deciding behavioral equivalence is central to automated feedback, grading support, and repair validation, yet tree differencing is too strict and single-run dynamic comparison is unsound for concurrent, random, and timing-dependent behavior.
We observe that equivalence for block-based programs is lens-parametric: final state, frame traces, monitors, event causality, and debug traces induce different observation relations. SPECTRA makes this explicit through a taxonomy of causal divergence phenomena and observation lenses. It compiles Scratch projects into a causal IR of typed resources and semantic transactions, canonicalizes renamings, guards, and procedure bodies, quotients same-trigger concurrency with Mazurkiewicz trace normal forms, separates program order from races, and handles residual frontiers through SMT obligations and VM-backed counterexample-guided refinement. Conclusive verdicts carry evidence: equivalence by bijection and trace quotient, difference by a typed witness, and unresolved cases remain unknown.
On a VM-witnessed mutation corpus from real Scratch projects, SPECTRA decides all 444 validated pairs and makes 0/158 false-equivalence claims on witnessed-different pairs under strict scoring. Structural, dynamic-only, and LLM baselines fail on the classes predicted by the taxonomy; ablations quantify the contribution of partial-order reduction and lens parametricity; and targeted scenarios expose ambiguous-mutant divergences missed by random testing.

Submission history

From: Yuan Si [view email]
[v1] Sun, 14 Jun 2026 13:38:35 UTC (40 KB)