Real-time anomaly detection using clustering in big data technologies / Riyaz Ahamed Ariyaluran Habeeb

Riyaz Ahamed , Ariyaluran Habeeb (2019) Real-time anomaly detection using clustering in big data technologies / Riyaz Ahamed Ariyaluran Habeeb. PhD thesis, Universiti Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (195Kb)
    [img] PDF (Thesis PhD)
    Download (3321Kb)


      The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention among researchers, off late, specifically in the domain of anomaly detection in network, which is considered crucial for network security. However, critical reviews have identified that the existing approaches are inefficient in processing data to detect anomalies due to the amassment of massive volumes of data through the connected devices. Therefore, it is crucial to propose a framework that effectively handles real time big data processing and detect anomalies in networks. In this regard, this research attempted to address the issue of accuracy in anomalies detection in real time. To begin with, the existing state-of-the-art techniques related to anomaly detection, real-time big data technologies and machine learning algorithms have been critically reviewed to identify the problems. Subsequently, comparative analysis to further establish the problems has been carried out via utilization of various existing algorithms which were then validated using three openly available datasets. Based on the outcome of the analysis, this research proposed a novel framework namely real-time anomaly detection based on big data technologies (RTADBDT), along with supporting implementation algorithms. The framework comprises of BroIDS, Flume, Kafka, Spark Streaming, Spark MLlib, Matplot and HBase. The BroIDS processes the existing datasets and generates various log files such as HTTP which is used in this research while Flume component reads and tracks the incoming packet data blocks. Kafka comprises repository of messages, categorized into different topics, with each category further divided into numerous partitions comprising of well-arranged and absolute sequence of messages. Meanwhile, Spark Streaming effectively provides illustrious abstraction known as DStream, signifying an uninterrupted stream of data whereas Spark MLlib leverages algorithmic optimizations of MLlib and applies them in the proposed algorithms. Ultimately, the processed data has been visualised by using Matplot and stored via HBase. The proposed framework was validated to substantiate its efficacy particularly in terms of accuracy, memory consumption and execution time by performing critical comparative analysis using internal, external and statistical techniques. The performance of the proposed framework was assessed using mathematical expressions derived in this research and also by conducting comparative analysis. All the analysis has proven that the proposed framework’s technique has outperformed other existing techniques in terms of accuracy, memory consumption and execution time. The significance of this research can be attributed to wide spectrum in the body of knowledge, with the proposed framework serve as a backbone in real-time anomaly detection with increased accuracy, minimised memory consumption and shortened execution time. Furthermore, when implemented, this framework shall enable an organization to instantly detect anomaly in real-time while having potential for a more effective fault tolerance and scalability.

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) – Faculty of Computer Science & Information Technology, Universiti Malaya, 2019.
      Uncontrolled Keywords: Anomaly detection; Real-time big data processing; Clustering; Spark streaming; MLlib
      Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
      Divisions: Faculty of Computer Science & Information Technology
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 31 Mar 2022 03:04
      Last Modified: 31 Mar 2022 03:04

      Actions (For repository staff only : Login required)

      View Item