Automatic Identification of Extrinsic Bug Reports for Just-In-Time Bug Prediction
Published in Science of Computer Programming, 2025
In software development, developers create bug reports within Issue Tracking System (ITS) to describe the cause, symptoms, severity, and other technical details of bugs. ITS includes reports of both intrinsic bugs (i.e., those originating within the software itself) and extrinsic bugs (i.e., those arising from third-party dependencies). Although extrinsic bugs do not appear in any activities within the Version Control System (VCS), Just-In-Time (JIT) bug prediction can still leverage internal software information, such as VCS process metrics.
Previous research has shown that excluding extrinsic bugs can significantly improve JIT bug prediction model’s performance. However, manually classifying intrinsic and extrinsic bugs is time-consuming and prone to errors. To address this issue, we propose a CAN model that integrates the local feature extraction capability of TextCNN with the nonlinear approximation advantage of the Kolmogorov-Arnold Network (KAN). Experiments on 1,880 labeled data samples from the OpenStack project demonstrate that the CAN model outperforms benchmark models such as BERT and CodeBERT, achieving an accuracy of 0.7492 and an F1-score of 0.8072. By comparing datasets with and without source code, we find that incorporating source code information enhances model performance. Finally, using the Local Interpretable Model-agnostic Explanations (LIME), an explainable artificial intelligence technique, we identify that keywords such as test and api in bug reports significantly contribute to the prediction of extrinsic bugs.
Recommended citation: Guisheng Fan*, Yuguo Liang*, Longfei Zu, Huiqun Yu, Zijie Huang, Wentao Chen. Automatic Identification of Extrinsic Bug Reports for Just-In-Time Bug Prediction. Science of Computer Programming, 2025, 249: 103410. https://doi.org/10.1016/j.scico.2025.103410. [CCF-B/SCI-Q4]
Download Paper | Download Bibtex
