Vithyatheri , Govindan (2024) A hyperbolic sarcasm detection model using negative sentiment tweets and machine learning algorithms / Vithyatheri Govindan. PhD thesis, Universiti Malaya.
PDF (The Candidate's Agreement) Restricted to Repository staff only Download (216Kb) | |
PDF (Thesis PhD) Restricted to Repository staff only until 31 December 2025. Download (1602Kb) |
Abstract
Social media platforms consist of rich language resources and is a valuable source for analysing people’s sentiment around the globe. Sarcasm detection is one of many challenges faced in sentiment analysis and is a classification problem. Many past studies employed various techniques and approaches to detect sarcasm. Even though hyperbole is one of the common approaches used individually or combined with other approaches such as lexical or pragmatic, not all types of hyperboles were used. From past research, the top five hyperboles identified and explored in this research to detect sarcasm are intensifier, interjection, capital letters, punctuation marks and elongated words. Each of the hyperboles were identified from six thousand and six hundred pre-processed negative sentiment tweets comprising of #Chinesevirus, #Kungflu, #COVID19, #Hantavirus and #Coronavirus hash tagged tweets. The unbiased dataset was analyzed using three renowned machine learning algorithms, that is, Support Vector Machine, Random Forest, and Random Forest with Bagging. A total of 81 models were evaluated with single and double hyperboles consisting of the top two dominant hyperboles as well as with all hyperbole features. With the presence of hyperbolic words in the tweets in an unbiased dataset, the proposed model (two-class setup) with interjection word achieved an accuracy of 76.61%, 78% precision, 85% recall, an AUC of 74% and F-score of 82% respectively. The model with all hyperboles achieved accuracy of 78.89%, 81% precision, 87% recall, an AUC of 76% and F-score of 84%, respectively. Experiments and analysis conducted in this study concluded that hyperboles exist in an unbiased dataset which helps enhance the sarcasm detection as well. A similar approach was undertaken on an open dataset which focused on lexical approach and artificial recurrent neural network (RNN). The proposed model performed well achieving an accuracy of 89.46% and 90% precision, an increase of 10% in accuracy and more than 40% for precision. Another avenue explored in this study is to determine the significant hyperbole and intensifier was found to be the most significant hyperbole (p< .0001). This finding coincides with ablation study which shows that intensifier as the predominant hyperbole for detecting sarcasm. Experiments and analysis conducted in this study concluded that hyperboles exist in an unbiased dataset which helps enhance the sarcasm detection as well.
Item Type: | Thesis (PhD) |
---|---|
Additional Information: | Thesis (PhD) - Faculty of Computer Science & Information Technology, Universiti Malaya, 2024. |
Uncontrolled Keywords: | Hyperbole; Sarcasm detection; Sentiment analysis; Machine learning; Social media; Accuracy |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Computer Science & Information Technology |
Depositing User: | Mr Mohd Safri Tahir |
Date Deposited: | 13 Sep 2024 02:01 |
Last Modified: | 13 Sep 2024 02:01 |
URI: | http://studentsrepo.um.edu.my/id/eprint/15401 |
Actions (For repository staff only : Login required)
View Item |