A partition based feature selection approach for mixed data clustering / Ashish Dutt

Ashish , Dutt (2020) A partition based feature selection approach for mixed data clustering / Ashish Dutt. PhD thesis, Universiti Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (198Kb)
    [img] PDF (Thesis PhD)
    Download (1169Kb)

      Abstract

      Presently, educational institutions compile and store huge volumes of data, such as student enrolment and attendance records, as well as their examination results. Mining such data yields stimulating information that serves its handlers well. Rapid growth in educational data points to the fact that distilling massive amounts of data requires a more sophisticated set of algorithms. This issue led to the emergence of the field of Educational Data Mining (EDM). Traditional data mining algorithms cannot be directly applied to educational problems, as they may have a specific objective and function. This implies that a pre-processing algorithm has to be enforced first and only then some specific data mining methods can be applied to the problems. One such pre-processing algorithm in EDM is clustering. It is a widely used method in data mining to discover unique patterns in underlying data. It finds patterns by analysing the features in data. A feature contains a measured value. A value can be of an atomic type like categorical (text only) or numerical (number only). A categorical data type can be ordinal (ordered) or nominal (unordered). In either case, the feature is of univariate data type. Often in real-world environment, data consist of both categorical and numerical valued features. Such datasets are called mixed data. In literature, several clustering methods exist for analysing numerical or categorical data. There are a few clustering algorithms for handling mixed data. Clustering mixed data is dependent on the dissimilarities of its constituent features. This dependability on data types may influence a clustering solution. Assigning appropriate weights to the feature, such that it diminishes the data type influence may improve the performance of a partition clustering algorithm. In this thesis, a novel weighted feature selection approach on nominal features is proposed, for a partition. clustering algorithm that can handle mixed data. The proposed approach exploits the pre-processing nature of the partition clustering algorithm in the selection of weight assignment for nominal features. The benefits of weighting are demonstrated on both simulated and real-world mixed datasets. The experimental results yield better results for weighted nominal features in mixed data clustering.

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) – Faculty of Computer Science & Information Technology, Universiti Malaya, 2020.
      Uncontrolled Keywords: Clustering; Educational data mining; Mixed data; Algorithms; CVM method
      Subjects: Q Science > QA Mathematics > QA76 Computer software
      Divisions: Faculty of Computer Science & Information Technology
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 08 Jun 2023 01:29
      Last Modified: 08 Jun 2023 01:29
      URI: http://studentsrepo.um.edu.my/id/eprint/14481

      Actions (For repository staff only : Login required)

      View Item