A neighbourhood undersampling stacked ensemble with H-measure maximising meta-learner for imbalanced classification / Seng Zian

Seng , Zian (2021) A neighbourhood undersampling stacked ensemble with H-measure maximising meta-learner for imbalanced classification / Seng Zian. PhD thesis, Universiti Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (193Kb)
    [img] PDF (Thesis PhD)
    Download (2696Kb)

      Abstract

      Stacked ensemble formulates an ensemble using a meta-learner to combine (stack) the predictions of multiple base classifiers. It suffers from the problem of suboptimal performance in imbalanced classification. Several underlying difficulty factors are reported to be responsible for performance degradation in imbalanced classification. This research aims to improve the classification performance of the stacked ensemble on imbalanced datasets by investigating the stacked ensemble’s meta-learner and the underlying difficulty factors (i.e., class imbalance, class overlapping, and class noise). Since the stacked ensemble’s imbalanced classification performance depends on the configuration of its meta-learner, an experiment (i.e., Experiment 1) was conducted to identify the best performing type of meta-learner. The results of Experiment 1 showed that the weighted combination-based meta-learner outperformed other types of meta-learners. Also, based on Experiment 1’s result, the ‘AUC-maximising meta-learner’ is one of the best performing weighted combination-based meta-learners. Inspired by the superior performance of the AUC-maximising meta-learner (in Experiment 1) and the importance of H-measure (in the literature), a new weighted combination-based meta-learner that maximises the H-measure (i.e., H-measure maximising meta-learner) was further proposed. Experiment 2 was conducted to evaluate the proposed H-measure maximising meta-learner. Then, it was benchmarked with the top 3 meta-learners in Experiment 1 and superior classification performance of the proposed meta-learner was observed. Then, this research further investigated the stacked ensemble’s degradation problem from the perspective of underlying difficulty factors in imbalanced datasets. A stacked ensemble coined as Neighbourhood Undersampling Stacked Ensemble (NUS-SE) was proposed. The NUS-SE consists of two proposed components, i.e., the US-SE framework and the Neighbourhood Undersampling. Experiment 3 was performed to evaluate the performance of the proposed NUS-SE. Since NUS-SE is integrable with any meta-learner, the top 3 meta-learners in Experiment 1 and the proposed H-measure maximising meta-learner were used as the meta-learners of NUS-SE in Experiment 3. Based on Experiment 3’ results, the NUS-SE with Hmeasure maximising meta-learner (NUS-SE-H) outperformed all the original unmodified stacked ensembles with different meta-learners and the proposed NUS-SE with other top-performing meta-learners (i.e., NUS-SE-AUC, NUS-SE-CCLL, NUS-SE-NNLS).

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) – Faculty of Computer Science & Information Technology, Universiti Malaya, 2021.
      Uncontrolled Keywords: Imbalanced classification; AUC-maximising meta-learner; Classification performance; NUS-SE; H-measure
      Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
      T Technology > T Technology (General)
      Divisions: Faculty of Computer Science & Information Technology > Dept of Information System
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 18 Feb 2024 01:45
      Last Modified: 18 Feb 2024 01:45
      URI: http://studentsrepo.um.edu.my/id/eprint/14776

      Actions (For repository staff only : Login required)

      View Item