Semi-supervised learning for feature selection and classification of data / Ganesh Krishnasamy

Ganesh , Krishnasamy (2019) Semi-supervised learning for feature selection and classification of data / Ganesh Krishnasamy. PhD thesis, University of Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (192Kb)
    [img] PDF (Thesis PhD)
    Restricted to Repository staff only until 31 December 2021.

    Download (903Kb)

      Abstract

      Feature selection and classification are widely utilized for data analysis. Recently, considerable advancement has been achieved in semi-supervised multi-task feature selection algorithms, where they have exploited the shared information from multiple related tasks. However, these semi-supervised multi-task selection feature algorithms are unable to naturally handle the multi-view data since they are designed to deal with single-view data. Existing studies have demonstrated that mining information enclosed in multiple views can drastically enhance the performance of feature selection. As for classification, researchers have used semi-supervised learning for extreme learning machine (ELM), where they have exploited both the labeled and unlabeled data in order to boost the learning performances. They have incorporated Laplacian regularization to determine the geometry of the underlying manifold. However, Laplacian regularization lacks extrapolating power and biases the solution towards a constant function. These drawbacks affect the performances of Laplacian regularized semi-supervised ELMs when a few labeled data is used. In the first part of the study, a novel mathematical framework is introduced for multi-view Laplacian semi-supervised feature selection by mining the correlations among multiple tasks. The proposed algorithm is capable of exploiting complementary information from different feature views in each task while exploring the shared knowledge between multiple related tasks in a joint framework when the labeled training data is sparse. An efficient iterative algorithm is developed to optimize the objective function of the proposed algorithm since it is non-smooth and difficult to solve. The proposed algorithm is compared with the state-of-the-art feature selection algorithms using three different datasets. These datasets include consumer video dataset, 3D motion recognition dataset and handwritten digits recognition dataset. In these experiments, all the training and testing data are represented as feature vectors. By using the proposed algorithm, the sparse coefficients are learned by exploiting the relationships among different multi-view features and leveraging the knowledge from multiple related tasks. Then, the sparse coefficients are applied to both the feature vectors of the training and testing data to select the most representative features. The selected features are then fed into a linear support vector machine (SVM) for classification. The experimental results show that the proposed feature selection framework performed better when compared to other state-of-the-art feature selection algorithms. In the second part of the study, a novel classification algorithm called Hessian semi-supervised ELM (HSS-ELM) is proposed to enhance the semi-supervised learning of ELM. Unlike the Laplacian regularization, the Hessian regularization favours function whose values vary linearly along the geodesic distance and preserves the local manifold structure well. It leads to good extrapolating power. Furthermore, HSS-ELM maintains almost all the advantages of the traditional ELM such as the significant training efficiency and straightforward implementation for multiclass classification problems. The proposed algorithm is tested on publicly available datasets. These datasets include G50C, COIL20 (B), COIL20, USPST(B) and USPST. The experimental results demonstrate that the proposed algorithm is competitive compared to the state-of-the-art semi-supervised learning algorithms in terms of accuracy. Additionally, HSS-ELM requires remarkably less training time compared to semi-supervised SVMs/regularized least-squares algorithms.

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) - Faculty of Engineering, University of Malaya, 2019.
      Uncontrolled Keywords: Feature selection; Semi-supervised learning; Multi-view learning; Multi-task learning; Extreme learning machine
      Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering
      Divisions: Faculty of Engineering
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 06 Feb 2020 01:21
      Last Modified: 06 Feb 2020 01:21
      URI: http://studentsrepo.um.edu.my/id/eprint/10006

      Actions (For repository staff only : Login required)

      View Item