Evaluating the Agreement among Technical Debt Measurement Tools: Building an Empirical Benchmark of Technical Debt Liabilities

Technical Debt Benchmarking

Software teams are often asked to deliver new features within strict deadlines leading developers to deliberately or inadvertently serve “not quite right code” compromising software quality and maintainability. This non-ideal state of software is efficiently captured by the Technical Debt (TD) metaphor, which reflects the additional effort that has to be spent to maintain software. Although several tools are available for assessing TD, each tool essentially checks software against a particular ruleset. The use of different rulesets can often be beneficial as it leads to the identification of a wider set of problems; however, for the common usage scenario where developers or researchers rely on a single tool, diverse estimates of TD and the identification of different mitigation actions limits the credibility and applicability of the findings. The objective of this study is two-fold: First, we evaluate the degree of agreement among leading TD assessment tools. Second, we propose a framework to capture the diversity of the examined tools with the aim of identifying few “reference assessments” (or class/file profiles) representing characteristic cases of classes/files with respect to their level of TD. By extracting sets of classes/files exhibiting similarity to a selected profile (e.g., that of high TD levels in all employed tools) we establish a basis that can be used either for prioritization of maintenance activities or for training more sophisticated TD identification techniques. The proposed framework is illustrated through a case study on fifty (50) open source projects and two programming languages (Java and JavaScript) employing three leading TD tools.

Go to TD Benchmarker tool
Download dataset (25 Java and 25 JavaScript projects)