惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
Security Affairs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Jina AI
Jina AI
P
Palo Alto Networks Blog
GbyAI
GbyAI
大猫的无限游戏
大猫的无限游戏
A
Arctic Wolf
Hugging Face - Blog
Hugging Face - Blog
小众软件
小众软件
Y
Y Combinator Blog
T
The Blog of Author Tim Ferriss
Blog — PlanetScale
Blog — PlanetScale
S
Schneier on Security
V
Vulnerabilities – Threatpost
C
Cybersecurity and Infrastructure Security Agency CISA
雷峰网
雷峰网
T
Tenable Blog
人人都是产品经理
人人都是产品经理
T
Tor Project blog
C
Cyber Attacks, Cyber Crime and Cyber Security
AWS News Blog
AWS News Blog
Microsoft Security Blog
Microsoft Security Blog
J
Java Code Geeks
Scott Helme
Scott Helme
SecWiki News
SecWiki News
C
CERT Recently Published Vulnerability Notes
Recorded Future
Recorded Future
I
InfoQ
Security Archives - TechRepublic
Security Archives - TechRepublic
Help Net Security
Help Net Security
Cloudbric
Cloudbric
C
Check Point Blog
Engineering at Meta
Engineering at Meta
TaoSecurity Blog
TaoSecurity Blog
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
博客园_首页
N
News and Events Feed by Topic
云风的 BLOG
云风的 BLOG
MyScale Blog
MyScale Blog
腾讯CDC
量子位
Application and Cybersecurity Blog
Application and Cybersecurity Blog
K
Kaspersky official blog
Vercel News
Vercel News
F
Full Disclosure
T
Troy Hunt's Blog
Forbes - Security
Forbes - Security
S
Security @ Cisco Blogs

cs.SE updates on arXiv.org

VLA Foundry: A Unified Framework for Training Vision-Language-Action Models Evaluating LLM-Generated Obfuscated XSS Payloads for Machine Learning-Based Detection Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery From Particles to Perils: SVGD-Based Hazardous Scenario Generation for Autonomous Driving Systems Testing Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning Neurosymbolic Repo-level Code Localization CodeMMR: Bridging Natural Language, Code, and Image for Unified Retrieval Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility Verification Modulo Tested Library Contracts The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE Scaling Test-Time Compute for Agentic Coding AI-Assisted Requirements Engineering: An Empirical Evaluation Relative to Expert Judgment From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution Atropos: Improving Cost-Benefit Trade-off of LLM-based Agents under Self-Consistency with Early Termination and Model Hotswap Vibe-Coding: Feedback-Based Automated Verification with no Human Code Inspection, a Feasibility Study Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-Codex Bounded Autonomy for Enterprise AI: Typed Action Contracts and Consumer-Side Execution AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks Asking What Matters: Reward-Driven Clarification for Software Engineering Tasks Prompt-Driven Code Summarization: A Systematic Literature Review LinuxArena: A Control Setting for AI Agents in Live Production Software Environments LLMs taking shortcuts in test generation: A study with SAP HANA and LevelDB Large Language Models to Enhance Business Process Modeling: Past, Present, and Future Trends CollabCoder: Plan-Code Co-Evolution via Collaborative Decision-Making for Efficient Code Generation Sentiment analysis for software engineering: How far can zero-shot learning (ZSL) go? Learning from Change: Predictive Models for Incident Prevention in a Regulated IT Environment The Cognitive Circuit Breaker: A Systems Engineering Framework for Intrinsic AI Reliability Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents CodeTracer: Towards Traceable Agent States Context Kubernetes: Declarative Orchestration of Enterprise Knowledge for Agentic AI Systems FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python AgentForge: Execution-Grounded Multi-Agent LLM Framework for Autonomous Software Engineering OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Alignment for LLM-Based Multi-Agent Systems Designing Adaptive Digital Nudging Systems with LLM-Driven Reasoning Taking a Pulse on How Generative AI is Reshaping the Software Engineering Research Landscape E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning Ambiguity Detection and Elimination in Automated Executable Process Modeling Compliant But Unsatisfactory: The Gap Between Auditing Standards and Practices for Probabilistic Genotyping Software Resilient Write: A Six-Layer Durable Write Surface for LLM Coding Agents LLMs for Qualitative Data Analysis Fail on Security-specificComments in Human Experiments Verify Before You Fix: Agentic Execution Grounding for Trustworthy Cross-Language Code Analysis The Code Whisperer: LLM and Graph-Based AI for Smell and Vulnerability Resolution AutoFlows++: Hierarchical Message Flow Mining for System on Chip Designs DynamicsLLM: a Dynamic Analysis-based Tool for Generating Intelligent Execution Traces Using LLMs to Detect Android Behavioural Code Smells Vibe-driven model-based engineering Machine Learning-Based Detection of MCP Attacks Towards an Appropriate Level of Reliance on AI: A Preliminary Reliance-Control Framework for AI in Software Engineering How Many Tries Does It Take? Iterative Self-Repair in LLM Code Generation Across Model Scales and Benchmarks Intent-aligned Formal Specification Synthesis via Traceable Refinement ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents From Helpful to Trustworthy: LLM Agents for Pair Programming MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis Applying an Agentic Coding Tool for Improving Published Algorithm Implementations Formal Architecture Descriptors as Navigation Primitives for AI Coding Agents Rebooting Microreboot: Architectural Support for Safe, Parallel Recovery in Microservice Systems Can Coding Agents Be General Agents? Automating Structural Analysis Across Multiple Software Platforms Using Large Language Models CCCE: A Continuous Code Calibration Engine for Autonomous Enterprise Codebase Maintenance via Knowledge Graph Traversal and Adaptive Decision Gating Building Trust in the Skies: A Knowledge-Grounded LLM-based Framework for Aviation Safety Contract-Coding: Towards Repo-Level Generation via Structured Symbolic Paradigm ECM Contracts: Contract-Aware, Versioned, and Governable Capability Interfaces for Embodied Agents Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent CODESTRUCT: Code Agents over Structured Action Spaces Chinese Language Is Not More Efficient Than English in Vibe Coding: A Preliminary Study on Token Cost and Problem-Solving Rate Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy WybeCoder: Verified Imperative Code Generation QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation ContractSkill: Repairable Contract-Based Skills for Multimodal Web Agents From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability Evaluating Reliability Gaps in Large Language Model Safety via Repeated Prompt Sampling From Scalars to Tensors: Declared Losses Recover Epistemic Distinctions That Neutrosophic Scalars Cannot Express Automating Crash Diagram Generation Using Vision-Language Models: A Case Study on Multi-Lane Roundabouts LoRA-MME: Multi-Model Ensemble of LoRA-Tuned Encoders for Code Comment Classification MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion A Pythonic Functional Approach for Semantic Data Harmonisation in the ILIAD Project Help Without Being Asked: A Deployed Proactive Agent System for On-Call Support with Continuous Self-Improvement ACE-Bench: A Lightweight Benchmark for Evaluating Azure SDK Usage Correctness X-SYS: A Reference Architecture for Interactive Explanation Systems KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations VeruSAGE: A Study of Agent-Based Verification for Rust Systems Process-Centric Analysis of Agentic Software Systems Enabling Predictive Maintenance in District Heating Substations: A Labelled Dataset and Fault Detection Evaluation Framework based on Service Data Context-Guided Decompilation: A Step Towards Re-executability Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model From Charts to Code: A Hierarchical Benchmark for Multimodal Models E2Edev: Benchmarking Large Language Models in End-to-End Software Development Task AISysRev -- LLM-based Tool for Title-abstract Screening SecureVibeBench: Benchmarking Secure Vibe Coding of AI Agents via Reconstructing Vulnerability-Introducing Scenarios TriagerX: Dual Transformers for Bug Triaging Tasks with Content and Interaction Based Rankings CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation A PennyLane-Centric Dataset to Enhance LLM-based Quantum Code Generation using RAG
Fix the Tests: Augmenting LLMs to Repair Test Cases with Static Collector and Neural Reranker
Jun Liu, Jiwei Yan, Yuanyuan Xie, Jun Yan, Jian Zhang · 2024-07-04 · via cs.SE updates on arXiv.org

During software evolution, it is advocated that test code should co-evolve with production code. In real development scenarios, test updating may lag behind production code changing, which may cause compilation failure or bring other troubles. Existing techniques based on pre-trained language models can be directly adopted to repair obsolete tests caused by such unsynchronized code changes, especially syntactic-related ones. However, the lack of task-oriented contextual information affects the repair accuracy on large-scale projects. Starting from an obsolete test, the key challenging task is precisely identifying and constructing Test-Repair-Oriented Contexts (TROCtxs) from the whole repository within a limited token size. In this paper, we propose SYNTER (SYNtactic-breaking-changes-induced TEst Repair), a novel approach based on LLMs to automatically repair obsolete test cases via precise and concise TROCtxs construction. Inspired by developers' programming practices, we design three types of TROCtx: class context, usage context, and environment context. Given an obsolete test case to repair, SYNTER firstly collects the related code information for each type of TROCtx through static analysis techniques automatically. Then, it generates reranking queries to identify the most relevant TROCtxs, which will be taken as the repair-required key contexts and be input to the large language model for the final test repair. To evaluate the effectiveness of SYNTER, we construct a benchmark dataset that contains a set of obsolete tests caused by syntactic breaking changes. The experimental results show that SYNTER outperforms baseline approaches both on textual- and intent-matching metrics. With the augmentation of constructed TROCtxs, hallucinations are reduced by 57.1%.