Automatic Identification of Extrinsic Bug Reports for Just-In-Time Bug Prediction
Published in Science of Computer Programming, 2025
• We extract bug reports from the OpenStack project and apply various text classification techniques to evaluate the effectiveness of identifying extrinsic bugs. Experimental results indicate that text classification techniques can effectively identify extrinsic bugs by analyzing bug reports. Our proposed CAN model demonstrates the best performance in terms of the F1 score.
• We examine the role of source code in identifying extrinsic bugs. We hypothesize that valuable information can be extracted from the source code present in bug reports and conduct experiments to compare the classification performance of datasets incorporating and excluding code. The experimental results indicate that datasets incorporating source code as text enhance the models’ ability to identify extrinsic bugs, while excluding source code generally degrades models’ performance.
• We employ LIME to analyze feature importance in the CAN model’s correct predictions. Experimental results show that words such as line and py play a significant role in classifying intrinsic bugs, while terms like test and api strongly influence the classification of extrinsic bugs.
Recommended citation: Guisheng Fan*, Yuguo Liang*, Longfei Zu, Huiqun Yu, Zijie Huang, Wentao Chen. Automatic Identification of Extrinsic Bug Reports for Just-In-Time Bug Prediction. Science of Computer Programming, 2025, 249: 103410. https://doi.org/10.1016/j.scico.2025.103410. [CCF-B/SCI-Q4]
Download Paper | Download Bibtex
