A comparison of clustering algorithms for data anonymization / Zahra Mahmoud

Zahra, Mahmoud (2019) A comparison of clustering algorithms for data anonymization / Zahra Mahmoud. Masters thesis, University of Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (201Kb)
    [img]
    Preview
    PDF (Thesis (M.A)
    Download (714Kb) | Preview

      Abstract

      Organizations today can easily store massive amounts of data as the cost of storage has significantly plummeted over the years. Data is used to help them raise their brand's value. However, as data becomes easier to store in mass amounts, the security risk also increases. In the last two years alone, multiple data leaks have been reported, the latest being from the Ministry of Education in Malaysia. Over the years, there has been extensive research on data security. Literature review showed that many researches have employed methods such as data encryption or privacy protection data publishing (PPDP). This thesis focuses more on the latter, as data encryption has proven to be more costly. Many of the literature also focused on using generalization and suppression to achieve the level of anonymity it required. However, a heavily suppressed or generalized data may paint a different picture instead. The objective of this thesis is to find a method of data anonymization that is efficient and produces the least percentage of information loss. By comparing multiple different types of PPDP, the researcher then determined that the clustering method is the best fit for this purpose. Next, multiple types of existing clustering algorithms are compared to determine which has the best performance. The researcher then created an enhanced method to do a final comparison– the researcher manipulated the distance function to show how cluster distance difference can affect the outcome of the anonymized dataset.

      Item Type: Thesis (Masters)
      Additional Information: Dissertation (M.A.) – Faculty of Computer Science & Information Technology, University of Malaya, 2019.
      Uncontrolled Keywords: Data anonymization; Privacy protection data publishing (PPDP); Information loss; Algorithms; Data encryption
      Subjects: Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
      Divisions: Faculty of Computer Science & Information Technology
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 24 Dec 2019 04:26
      Last Modified: 17 Aug 2020 07:22
      URI: http://studentsrepo.um.edu.my/id/eprint/10708

      Actions (For repository staff only : Login required)

      View Item