Zahedeh, Zamanian (2019) Anomaly detection in system log files using machine learning algorithms / Zahedeh Zamanian. Masters thesis, University of Malaya.
PDF (The Candidate's Agreement) Restricted to Repository staff only Download (186Kb) | ||
| PDF (Thesis M.A) Download (1305Kb) | Preview |
Abstract
In recent years due to rapid growth of information technology and easy access to computers, digital devices and internet, security management and investigating malicious activity have been main concern of organization and governments. People who are greatest asset of organization, they may also be the greatest threat due to their access to highly confidential information and their knowledge of the organizational systems. Insider threat activity has huge impact on business. Therefore, there is a need for methods to detect insider threats inside an organization. Log files are great source of information which can help to detect, understand and predict these kinds of threats. However, the sheer size of log files generated by systems makes human log analysis impractical. Moreover, log files have a lot of irrelevant and redundant features that act as noise. Also, log files are heterogenous and cannot fed them directly in machine learning algorithms. Furthermore, many of the companies use the signature-based detection method which is not capable of capturing more advanced attackers that use unfamiliar attacks methods. This study uses machine learning method to detect anomalies in system log files. This study uses synthetic CERT Insider Threat v6.2 dataset that includes five different domains of file, logon/logoff, http, device and email. This study generates 200 features from raw system log files that can be fed to machine learning. This study uses principal component analysis (PCA) as a feature extraction method to extract 117 independent and discriminative features with 95% of variance. This study applies unsupervised Isolation Forest and One Class SVM as ML algorithms to detect anomalies. Isolation Forest area under curve (AUC) successfully achieved 96.6% with applying PCA and without PCA, lowest value of AUC was 76%. In contrast, the AUC value for One Class SVM was 69.3% with applying PCA and 59.8% without PCA. Isolation Forest true positive rate (TPR) successfully achieved 93.2% with applying PCA and without PCA, value of TPR was 89.2%. On the other hand, the TPR value for One Class SVM was 68.1% with applying PCA and 55.4% without PCA. The highest FPR result of 26% was obtained by One Class SVM without PCA and the lowest FPR result of 2.8% was obtained by Isolation Forest with applying PCA.
Item Type: | Thesis (Masters) |
---|---|
Additional Information: | Dissertation (M.A.) – Faculty of Computer Science & Information Technology, University of Malaya, 2019. |
Uncontrolled Keywords: | Anomaly detection; Machine learning; Insider threats; Feature extraction |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Computer Science & Information Technology |
Depositing User: | Mr Mohd Safri Tahir |
Date Deposited: | 27 Dec 2019 07:52 |
Last Modified: | 17 Aug 2020 07:23 |
URI: | http://studentsrepo.um.edu.my/id/eprint/10748 |
Actions (For repository staff only : Login required)
View Item |