Publications

Journal Articles

Automatic Identification of Extrinsic Bug Reports for Just-In-Time Bug Prediction

Published in Science of Computer Programming, 2025

• We extract bug reports from the OpenStack project and apply various text classification techniques to evaluate the effectiveness of identifying extrinsic bugs. Experimental results indicate that text classification techniques can effectively identify extrinsic bugs by analyzing bug reports. Our proposed CAN model demonstrates the best performance in terms of the F1 score.
• We examine the role of source code in identifying extrinsic bugs. We hypothesize that valuable information can be extracted from the source code present in bug reports and conduct experiments to compare the classification performance of datasets incorporating and excluding code. The experimental results indicate that datasets incorporating source code as text enhance the models’ ability to identify extrinsic bugs, while excluding source code generally degrades models’ performance.
• We employ LIME to analyze feature importance in the CAN model’s correct predictions. Experimental results show that words such as line and py play a significant role in classifying intrinsic bugs, while terms like test and api strongly influence the classification of extrinsic bugs.

Recommended citation: Guisheng Fan^*, Yuguo Liang^*, Longfei Zu, Huiqun Yu, Zijie Huang, Wentao Chen. Automatic Identification of Extrinsic Bug Reports for Just-In-Time Bug Prediction. Science of Computer Programming, 2025, 249: 103410. https://doi.org/10.1016/j.scico.2025.103410. [CCF-B/SCI-Q4]
Download Paper | Download Bibtex

Usage patterns of software product metrics in assessing developers’ output: A comprehensive study

Published in Information and Software Technology, 2025

• We explore how developers can intentionally manipulate the LOC metric using LLMs, leading to significant anomalies that affect the fairness and effectiveness of developers’ output assessments.
• We provide a thorough evaluation of existing product metrics, with a particular focus on efficiency metrics and quality metrics from SATs, assessing their practicality and cost-effectiveness.
• We conduct a rapid review of quantitative metrics used in past developers’ output research to help the company select relevant software metrics, providing guidance for future quantitative assessments of developers’ output.
• We establish a connection between academic research on software product metrics and practical applications in the industry, demonstrating how academic insights can influence real-world developers’ output assessment practices.

Recommended citation: Wentao Chen, Huiqun Yu^*, Guisheng Fan^*, Zijie Huang, Yuguo Liang. Usage Patterns of Software Product Metrics in Assessing Developers’ Output: A Comprehensive Study. Information and Software Technology, 2025, 189: 107935. https://doi.org/10.1016/j.infsof.2025.107935. [CCF-B/SCI-Q2]
Download Paper | Download Bibtex

Tool or Toy: Are SCA Tools Ready for Challenging Scenarios?

Published in Computers & Security, 2025

• This study proposes the first comprehensive evaluation model that integrates three core functions: dependency detection, vulnerability identification and license recognition. It covers multi-language ecosystems, source and binary forms and adversarial threats, addressing the limitations of previous research focused on single function or ecosystem.
• We construct a standardized test suite encompassing Java datasets, multi-language projects, diverse build methods, and adversarial scenarios. The datasets are derived from academic literature and leading open-source repositories over the past five years. All datasets and ground-truth lists are open-sourced to support reproducibility.
• Experiments on six datasets using four state-of-the-art tools, including RA, CleanSource, OpenSCA, and Snyk, this study identifies weaknesses in both commercial and open-source solutions and proposes targeted optimizations.

Recommended citation: Congyan Shu, Wentao Chen, Zijie Huang, Guisheng Fan^*, Huiqun Yu^*, Hengrun Zhang, Yuguo Liang. Tool or Toy: Are SCA Tools Ready for Challenging Scenarios? Computers & Security, 2025, 158: 104624. https://doi.org/10.1016/j.cose.2025.104624. [CCF-B/SCI-Q2]
Download Paper | Download Bibtex

Automatic Code Summarization Using Abbreviation Expansion and Subword Segmentation

Published in Expert Systems, 2025

• We propose the use of code abbreviation expansion to weaken the negative impact of abbreviations on program understanding and strengthen the language alignment ability of code summarization models. A series of context-based heuristic algorithms are adopted to expand abbreviations nested in code snippets of Java code summarization datasets.
• We introduce the unigram subword segmentation algorithm to expose more semantic information and further enhance the program understanding performance of code summarization models. Code-specific tokenizers are developed to tokenize code-summary pairs into more granular and semantically preserved subword sequences.
• We present a framework Semantic Enhanced Transformer for Code Summarization (SETCS) to better leverage the semantic information introduced by above methods. A robust baseline is designed by fusing embeddings of original and newly generated subtoken sequences, allowing for effective capture of critical information.
• To the best of our knowledge, this is the first work that incorporates code abbreviation expansion and subword segmentation into the automatic code summarization task. These methods are model-agnostic and can be easily integrated into existing automatic code summarization approaches. Experiments conducted on two widely evaluated datasets demonstrate the effectiveness of our proposed methods.

Recommended citation: Yuguo Liang, Guisheng Fan^*, Huiqun Yu^*, Mingchen Li, and Zijie Huang. Automatic Code Summarization Using Abbreviation Expansion and Subword Segmentation. Expert Systems, 2025, 42: e13835. https://doi.org/10.1111/exsy.13835. [CCF-C/SCI-Q4]
Download Paper | Download Bibtex

Aligning XAI explanations with software developers’ expectations: A case study with code smell prioritization

Published in Expert Systems with Applications, 2024

• We summarize the concerns of developers related to their decision-making toward code smell criticality, including code design and implementation, code evolution, code functionality, and developer-related factors.
• To our knowledge, we propose the first work that quantifies the gap between XAI explanation and developers’ expectations in code smell prioritization. The expectation could be huge even if all their concerns are captured by the features, e.g., more than 40% of the concerns of the developers do not appear in simple explanations.
• We discover that the gap could be narrowed to an acceptable extent by adapting to developers’ when selecting features, i.e., preserving the features related to the major concerns of developers as much as possible.
• We conclude that if the gap is narrowed, inspecting the top 3 to 5 important features is sufficient to meet the developers’ expectations in explaining issues with simpler causes such as Spaghetti Code, but the explanation may be less helpful for novice users in issues with complex or controversial causes such as Shotgun Surgery.
• We outline the challenges and opportunities of XAI for code smell prioritization and SQA in terms of feature engineering, problem definition, and XAI methodologies.

Recommended citation: Zijie Huang, Huiqun Yu^*, Guisheng Fan^*, Zhiqing Shao, Mingchen Li, Yuguo Liang. Aligning XAI explanations with software developers’ expectations: A case study with code smell prioritization. Expert Systems with Applications, 2024, 238: 121640. https://doi.org/10.1016/j.eswa.2023.121640. [CCF-C/SCI-Q1]
Download Paper | Download Bibtex

Conference Papers

JIT-Coka: An Improved Framework for Just-in-Time Defect Prediction and Localization Using Fused Features of Code Change

Published in 21th EAI International Conference on Collaborative Computing: Networking, Applications and Worksharing, 2025

• We perform comprehensive experiments on representative JIT-DP and DL models using the high quality JIT-Defects4J dataset and evaluate them with multiple metrics to fill the evaluation gap in prior studies.
• We develop JIT-Coka, a model applicable to both JIT-DP and DL tasks. On the DP task, JIT-Coka achieves significantly better performance than the current state-of-the-art model (JIT-Smart) in terms of Precision and MCC, while maintaining comparable performance in DL.
• We conduct a comprehensive ablation study to validate the effectiveness of each component of JIT-Coka. Moreover, the implementation and trained models are made publicly available to facilitate future research.

Recommended citation: Yuguo Liang, Chengcheng Wu, Wentao Chen, Guisheng Fan^*, Huiqun Yu^*. JIT-Coka: An Improved Framework for Just-in-Time Defect Prediction and Localization Using Fused Features of Code Change. Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2025. [CCF-C]
Download Paper | Download Slides

Liang Yuguo