


























This paper explores the use of text data augmentation techniques to enhance conflict and duplicate detection in software engineering tasks through sentence pair classification. The study adapts generic augmentation techniques such as shuffling, back translation, and paraphrasing and proposes new data augmentation techniques such as Noun-Verb Substitution, target-lemma replacement and Actor-Action Substitution for software requirement texts. A comprehensive empirical analysis is conducted on six software text datasets to identify conflicts and duplicates among sentence pairs. The results demonstrate that data augmentation techniques have a significant impact on the performance of all software pair text datasets. On the other hand, in cases where the datasets are relatively balanced, the use of augmentation techniques may result in a negative effect on the classification performance.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。