Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix

Ebubeogu Amarachukwu , Felix (2020) Supervised optimal decision machine learning approach to class- and method-level data preprocessing towards effective software defect prediction / Ebubeogu Amarachukwu Felix. PhD thesis, Universiti Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (160Kb)
    [img] PDF (Thesis PhD)
    Download (2518Kb)

      Abstract

      Software defect prediction provides actionable outputs to software teams while contributing to industrial success. Therefore, predicting the number of defects in a new version of software at both the class and method levels is an important goal of defect prediction studies to assist software teams in optimizing their test efforts towards improving software quality. However, despite remarkable achievements in defect prediction, the quality of the data applied in defect prediction studies has been a major concern, with related quality issues leading to numerous contradictory findings in machine learning research. In addition, a demonstrated approach for predicting the number of defects in a new software version is lacking. Therefore, efforts are required to demonstrate how class- and method-level defect prediction can be achieved for a new software version and to develop an approach for preprocessing the highly imbalanced class- and method-level data available for software defect prediction. To address these issues, first, a data preprocessing framework is proposed to overcome some of the challenges associated with typical software datasets, for instance, irrelevant and redundant features. A machine-learning-driven, supervised optimal decision procedure is followed in the development of this data preprocessing framework, resulting in a prime advantage of bias-free method- and class-level datasets. Second, a method of predicting the number of software defects in an upcoming product release is proposed using predictor variables derived from the defect acceleration observed based on the existing software defects, namely, the defect density, defect velocity and defect introduction time. The number of defects in the current version of a software product is characterized by this defect acceleration; hence, these derived predictor variables can be used to construct regression models to predict the number of software defects in a new version. An experiment conducted on 69 open-source ELFF Java projects, containing 131,034 classes and 289,132 methods, as well as on the NASA datasets, which contain 10 different Java and C++ projects with 22,838 classes, is reported. To evaluate the effectiveness of the proposed framework for data preprocessing, the average classification performances of six selected state-of-the-art classifiers before and after data preprocessing are investigated and compared across multiple projects with data imbalances between the defective and defect-free classes. For both the class and method levels, these selected state-of-the-art classifiers, namely, naïve Bayes, logistic regression, neural network, K-nearest neighbors, support vector machine and random forest classifiers, achieve noteworthy performance when applied to preprocessed datasets. Moreover, for the ELFF projects, the results at the class and method levels respectively show correlation coefficients of 61% and 60% for the defect density, -11% and -4% for the defect introduction time, and 94% and 93% for the defect velocity (consistent results are also obtained for the NASA datasets, as presented in the results section). The proposed approach can serve as a blueprint for program testing to enhance the effectiveness of software development activities.

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) – Faculty of Computer Science & Information Technology, Universiti Malaya, 2020.
      Uncontrolled Keywords: Machine learning; Software defect; Defect prediction; Data preprocessing; Defect velocity
      Subjects: Q Science > QA Mathematics > QA76 Computer software
      T Technology > TA Engineering (General). Civil engineering (General)
      Divisions: Faculty of Computer Science & Information Technology > Dept of Software Engineering
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 05 Jul 2023 07:29
      Last Modified: 05 Jul 2023 07:29
      URI: http://studentsrepo.um.edu.my/id/eprint/14571

      Actions (For repository staff only : Login required)

      View Item