Usage patterns of software product metrics in assessing developers’ output: A comprehensive study

Published in Information and Software Technology, 2025

Context:
Accurate assessment of developers’ output is crucial for both software engineering research and industrial practice. This assessment often relies on software product metrics such as lines of code (LOC) and quality indicators from static analysis tools. However, existing research lacks a comprehensive understanding of the usage patterns of product metrics, and a single metric is increasingly vulnerable to manipulation, particularly with the emergence of large language models (LLMs).

Objectives:
This study aims to investigate (1) how developers can intentionally manipulate commonly used metrics like LOC by using LLMs, (2) whether complex efficiency metrics provide consistent advantages over simpler metrics, and (3) the reliability and cost-effectiveness of quality metrics derived from tools such as SonarQube.

Methods:
We conduct empirical analyses involving three LLMs to achieve metric manipulation and evaluate product metric reliability across nine open-source projects. We further validate our findings through a collaboration with a large financial institution facing fairness concerns in developers’ output due to inappropriate metric usage.

Results:
We observe that developers can inflate LOC by an average of 60.86% using LLMs, leading to anomalous assessments. Complex efficiency metrics do not yield consistent performance improvements relative to their computational costs. Furthermore, quality metrics from SonarQube and PMD often fail to capture real quality changes and are expensive to compute. The software metric migration plan based on our findings effectively reduces evaluation anomalies in the industry and standardizes developers’ commits, confirming our conclusions’ practical validity.

Conclusion:
Our findings highlight critical limitations in current metric practices and demonstrate how thoughtful usage patterns of product metrics can improve fairness in developer evaluation. This work bridges the gap between academic insights and industrial needs, offering practical guidance for more reliable developers’ output assessment.

Recommended citation: Wentao Chen, Huiqun Yu*, Guisheng Fan*, Zijie Huang, Yuguo Liang. Usage Patterns of Software Product Metrics in Assessing Developers’ Output: A Comprehensive Study. Information and Software Technology, 2025, 189: 107935. https://doi.org/10.1016/j.infsof.2025.107935. [CCF-B/SCI-Q2]
Download Paper | Download Bibtex