惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Martin Fowler
Martin Fowler
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
T
Threat Research - Cisco Blogs
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cyber Attacks, Cyber Crime and Cyber Security
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
T
Troy Hunt's Blog
V
V2EX - 技术
Hacker News - Newest:
Hacker News - Newest: "LLM"
H
Heimdal Security Blog
T
Tor Project blog
IT之家
IT之家
Project Zero
Project Zero
GbyAI
GbyAI
Security Latest
Security Latest
S
Security Archives - TechRepublic
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
Spread Privacy
Spread Privacy
S
Security Affairs
A
Arctic Wolf
C
Cybersecurity and Infrastructure Security Agency CISA
I
Intezer
P
Palo Alto Networks Blog
宝玉的分享
宝玉的分享
Google DeepMind News
Google DeepMind News
T
Threatpost
I
InfoQ
F
Full Disclosure
Blog — PlanetScale
Blog — PlanetScale
Last Week in AI
Last Week in AI
Cisco Talos Blog
Cisco Talos Blog
N
Netflix TechBlog - Medium
MyScale Blog
MyScale Blog
H
Help Net Security
S
Securelist
Y
Y Combinator Blog
月光博客
月光博客
博客园_首页
Engineering at Meta
Engineering at Meta
酷 壳 – CoolShell
酷 壳 – CoolShell
J
Java Code Geeks
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
A
About on SuperTechFans
K
Kaspersky official blog
Microsoft Azure Blog
Microsoft Azure Blog
Vercel News
Vercel News
阮一峰的网络日志
阮一峰的网络日志
T
The Exploit Database - CXSecurity.com
B
Blog

cs.CL updates on arXiv.org

Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations Self-Calibrating Language Models via Test-Time Discriminative Distillation HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation Generating High Quality Synthetic Data for Dutch Medical Conversations GIANTS: Generative Insight Anticipation from Scientific Literature Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling Should We be Pedantic About Reasoning Errors in Machine Translation? Human vs. Machine Deception: Distinguishing AI-Generated and Human-Written Fake News Using Ensemble Learning Weird Generalization is Weirdly Brittle Computational Implementation of a Model of Category-Theoretic Metaphor Comprehension CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models Mirroring Minds: Asymmetric Linguistic Accommodation and Diagnostic Identity in ADHD and Autism Reddit Communities ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models Who Wrote This Line? Evaluating the Detection of LLM-Generated Classical Chinese Poetry CircuitSynth: Reliable Synthetic Data Generation Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis Think in Sentences: Explicit Sentence Boundaries Enhance Language Model's Capabilities ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification Nationality encoding in language model hidden states: Probing culturally differentiated representations in persona-conditioned academic text Relational Probing: LM-to-Graph Adaptation for Financial Prediction CodeComp: Structural KV Cache Compression for Agentic Coding FAITH: Factuality Alignment through Integrating Trustworthiness and Honestness Comparative Analysis of Large Language Models in Healthcare Adaptive Multi-Expert Reasoning via Difficulty-Aware Routing and Uncertainty-Guided Aggregation A Structured Clustering Approach for Inducing Media Narratives NameBERT: Scaling Name-Based Nationality Classification with LLM-Augmented Open Academic Data LASQ: A Low-resource Aspect-based Sentiment Quadruple Extraction Dataset BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection Turing or Cantor: That is the Question CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning Instruction Data Selection via Answer Divergence EviCare: Enhancing Diagnosis Prediction with Deep Model-Guided Evidence for In-Context Reasoning Dynamic Adaptive Attention and Supervised Contrastive Learning: A Novel Hybrid Framework for Text Sentiment Classification From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization LLMs Should Incorporate Explicit Mechanisms for Human Empathy Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models Knowing What to Stress: A Discourse-Conditioned Text-to-Speech Benchmark Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance HeceTokenizer: A Syllable-Based Tokenization Approach for Turkish Retrieval Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment BlasBench: An Open Benchmark for Irish Speech Recognition TInR: Exploring Tool-Internalized Reasoning in Large Language Models OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities How Robust Are Large Language Models for Clinical Numeracy? An Empirical Study on Numerical Reasoning Abilities in Clinical Contexts Evaluating Memory Capability in Continuous Lifelog Scenario Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization Hidden Measurement Error in LLM Pipelines Distorts Annotation, Evaluation, and Benchmarking A Triadic Suffix Tokenization Scheme for Numerical Reasoning Evaluating Cooperation in LLM Social Groups through Elected Leadership LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval Seven simple steps for log analysis in AI systems LETGAMES: An LLM-Powered Gamified Approach to Cognitive Training for Patients with Cognitive Impairment Generative UI: LLMs are Effective UI Generators LABBench2: An Improved Benchmark for AI Systems Performing Biology Research ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models COMPOSITE-Stem Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards Cross-Cultural Value Awareness in Large Vision-Language Models Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution The Amazing Agent Race: Strong Tool Users, Weak Navigators SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting Towards Proactive Information Probing: Customer Service Chatbots Harvesting Value from Conversation Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation Anthropogenic Regional Adaptation in Multimodal Vision-Language Model Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference What Factors Affect LLMs and RLLMs in Financial Question Answering? Echoes of Automation: The Increasing Use of LLMs in Newsmaking KCS: Diversify Multi-hop Question Generation with Knowledge Composition Sampling Preference Learning Unlocks LLMs' Psycho-Counseling Skills FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models Aligning What LLMs Do and Say: Towards Self-Consistent Explanations StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs Beyond Black-Box Interventions: Latent Probing for Faithful Retrieval-Augmented Generation Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation Disco-RAG: Discourse-Aware Retrieval-Augmented Generation GenProve: Learning to Generate Text with Fine-Grained Provenance Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation ChemPro: A Progressive Chemistry Benchmark for Large Language Models ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA Reasoning Models Will Sometimes Lie About Their Reasoning Linear Representations of Hierarchical Concepts in Language Models H-AdminSim: A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration
Finding Duplicates in 1.1M BDD Steps: cukereuse, a Paraphrase-Robust Static Detector for Cucumber and Gherkin
Ali Hassaan · 2026-04-23 · via cs.CL updates on arXiv.org

View PDF HTML (experimental)

Abstract:Behaviour-Driven Development (BDD) suites accumulate step-text duplication whose
maintenance cost is established in prior work. Existing detection techniques require
running the tests (Binamungu et al., 2018-2023) or are confined to a single
organisation (Irshad et al., 2020-2022), leaving a gap: a purely static,
paraphrase-robust, step-level detector usable on any repository. We fill the gap
with cukereuse, an open-source Python CLI combining exact hashing, Levenshtein
ratio, and sentence-transformer embeddings in a layered pipeline, released alongside
an empirical corpus of 347 public GitHub repositories, 23,667 parsed .feature
files, and 1,113,616 Gherkin steps. The step-weighted exact-duplicate rate is 80.2
%; the median-repository rate is 58.6 % (Spearman rho = 0.51 with size). The top
hybrid cluster groups 20.7k occurrences across 2.2k files. Against 1,020 pairs
manually labelled by the three authors under a released rubric (inter-annotator
Fleiss' kappa = 0.84 on a 60-pair overlap), we report precision, recall, and F1 with
bootstrap 95 % CIs under two protocols: the primary rubric and a score-free
second-pass relabelling. The strongest honest pair-level number is near-exact at F1
= 0.822 on score-free labels; the primary-rubric semantic F1 = 0.906 is inflated by
a stratification artefact that pins recall at 1.000. Lexical baselines
(SourcererCC-style, NiCad-style) reach primary F1 = 0.761 and 0.799. The paper also
presents a CDN-structured critique of Gherkin (Cognitive Dimensions of Notations);
eight of fourteen dimensions are rated problematic or unsupported. The tool, corpus,
labelled pairs, rubric, and pipeline are released under permissive licences.
Comments: 39 pages, 9 figures, 8 tables. Under review at Software Quality Journal. Tool, corpus, labelled benchmark, and rubric released at this https URL under Apache-2.0
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Information Retrieval (cs.IR)
ACM classes: D.2.5; D.2.7; I.2.7
Cite as: arXiv:2604.20462 [cs.SE]
  (or arXiv:2604.20462v1 [cs.SE] for this version)
  https://doi.org/10.48550/arXiv.2604.20462

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ali Hassaan Mughal [view email]
[v1] Wed, 22 Apr 2026 11:44:05 UTC (240 KB)