






















Abstract:GitHub issue trackers contain millions of developer-written quality concerns, including performance bottlenecks and security vulnerabilities, yet no publicly available GitHub dataset classifies these into fine-grained software quality categories. We construct and release GitReq GitHub Requirement Issue, comprising 6,302 expert-validated requirements mined from 55,588 raw GitHub candidates across 4,080 repositories, labeled across eight ISO/IEC 25010:2011-aligned categories: Performance, Security, Portability, Availability, Fault-tolerance, Scalability, Maintainability, and a Functional baseline. Dataset construction involved category-specific triple-signal GitHub mining, separate non-functional requirement (NFR) and functional requirement (FR) preprocessing pipelines with per-category parameters, and expert human annotation achieving substantial inter-annotator agreement (Fleiss' Kappa~=~0.72). Zero-shot evaluation with four large language models (LLMs) establishes baselines, with GPT-5.2 reaching the highest macro-averaged F1 of 0.641. GitReq is publicly released with full materials to advance research in automated requirement classification and software quality analysis.
From: Md Rakibul Islam [view email]
[v1]
Sat, 20 Jun 2026 00:05:32 UTC (55 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。