Ghulam , Mujtaba (2018) Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba. PhD thesis, University of Malaya.
PDF (The Candidate's Agreement) Restricted to Repository staff only Download (133Kb) | ||
| PDF (Thesis PhD) Download (2909Kb) | Preview |
Abstract
Forensic autopsy focuses on revealing the cause of death (CoD) by examining a dead body. This process is performed by medical pathologists during the investigation of criminal and civil law cases. In forensic autopsy, pathologists examine corpses externally and anatomically to collect autopsy findings. Moreover, these experts collect the history of the deceased and death scene-related information from the deceased’s relatives and eyewitnesses. Afterward, the pathologists determine the CoD through their expert knowledge while correlating the current autopsy findings with previous autopsy reports. Therefore, determining the CoD from autopsy findings is laborious, time consuming, and subject to inconsistencies associated with any labor-intensive process. Hence, automated text classification (ATC) techniques must be employed to overcome the aforementioned issues in determining the CoD. This study aimed to employ ATC techniques to classify the CoD from forensic autopsy reports. In the ATC technique, feature engineering is a highly important step because the success or failure of any ATC model is heavily dependent on the quality of the features used in the classification task. In ATC, the traditional feature engineering techniques include bag of words (BoW) and n-gram. This study argues that BoW and its variant techniques are inadequate in determining the CoD from forensic autopsy reports because these techniques ignore word-order, word-context, and word-level synonymy and polysemy. To overcome the aforementioned issues of BoW and its variant techniques, this study aimed to achieve the following four main objectives. First, this work intended to investigate the existing feature engineering techniques to classify free-text clinical reports, including forensic autopsy reports. Second, this study aimed to develop semi-automated expert-driven feature engineering to overcome the issue of word-level synonymy and polysemy. Third, this research sought to propose a fully automated conceptual graph-based feature engineering technique to address issues in word-order and word-context. Finally, this work intended to evaluate the proposed techniques by comparing their performances with existing baseline techniques. For the experimental evaluation, forensic autopsy reports of 16 different CoDs were obtained from a very large hospital in Kuala Lumpur, Malaysia. These reports were preprocessed by applying various text preprocessing techniques. The discriminative features were then extracted from the preprocessed reports through the proposed feature engineering techniques and formed numeric master feature vectors. These master feature vectors were fed as input to six machine learning algorithms to construct and evaluate the classification models. Furthermore, to show the effectiveness of the proposed techniques, this study compared their performances with five state-of-the-art baseline feature engineering techniques. Experimental results showed that the proposed techniques outperformed the traditional BoW and its variant techniques. Moreover, support vector machines and random forest algorithms outperformed the four other algorithms. The proposed techniques are feasible and practical in determining the CoD from forensic autopsy reports and can assist pathologists to accurately and rapidly determine the CoD from autopsy findings. Finally, the proposed techniques are generally applicable to other kinds of free-text clinical reports.
Item Type: | Thesis (PhD) |
---|---|
Additional Information: | Thesis (PhD) – Faculty of Computer Science & Information Technology, University of Malaya, 2018. |
Uncontrolled Keywords: | Automated text classification techniques; Forensic autopsy reports; Supervised machine learning algorithms; Feature engineering techniques; Free-text clinical reports |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Computer Science & Information Technology |
Depositing User: | Mr Mohd Safri Tahir |
Date Deposited: | 06 Jan 2020 02:57 |
Last Modified: | 09 Feb 2021 04:34 |
URI: | http://studentsrepo.um.edu.my/id/eprint/10667 |
Actions (For repository staff only : Login required)
View Item |