Multi-feature fusion framework for automatic sarcasm identification in Twitter data / Christopher Ifeanyi Eke

Christopher , Ifeanyi Eke (2021) Multi-feature fusion framework for automatic sarcasm identification in Twitter data / Christopher Ifeanyi Eke. PhD thesis, Universiti Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (227Kb)
    [img] PDF (Thesis PhD)
    Download (1566Kb)


      Recently, sentiment analysis in social network research has gained much recognition. The notion behind sentiment analysis is to determine the polarity of the emotion word in an expression. Analysis of people’s sentiments is a process of identifying subjective information in source documents. The process of identifying people’s opinions (sentiments) about products, politics, services, or individuals brings a lot of benefits to the organizations. For example, sarcasm is a type of sentiment where people express their negative emotions using positive words or intensified positive words in a text. In a sarcastic utterance, the expressed statement usually deflects the different meanings than their actual composition. Various feature engineering techniques such as Bag-of-words (BoWs), N-gram, and word embedding have been investigated to detect sarcasm in textual data automatically. However, the use of the features mentioned above results in the loss of contextual information due to the methods ignoring the context of words in the text. Furthermore, there are issues bothering on the sparsity of training data in sarcasm expression. This issue makes a feature vector for each sample constructed by BoW mostly null due to the microblog's word limit. Moreover, many deep learning methods in Natural Language Processing uses word embedding learning as a standard approach for feature vector representation. Nevertheless, one of the major drawbacks of word embedding is that it does not consider the sentiment polarity of the words. Consequently, words with opposite polarities are mapped into a close vector. To address the above-named problems and enhance the predictive performance in sarcasm identification, a Multi-Feature Fusion Framework for sarcasm identification is proposed using two classification stages. The first classification stage is constructed with a lexical feature only, extracted using the BoW technique and trained using five standard classifiers, including Support Vector Machine, Decision Tree, K-Nearest Neighbor, Logistic Regression, and Random Forest to predict the sarcastic tendency based on the lexical feature. In stage two, the extracted lexical feature is fused with the length of microblog, hashtag, discourse markers, emoticons, syntactic, pragmatic, semantic (GloVe embedding), and sentiment related features to form a feature fusion and modelled using various classifiers, including Support Vector Machine, Decision Tree, K-Nearest Neighbor, Logistic Regression, and Random Forest. The developed Multi-feature framework effectiveness is tested with various experimental analysis, which was performed to obtain classifiers’ performance. The evaluation shows that the constructed classification models based on the developed framework obtained results with the highest precision of 94.7% using a Random Forest classifier. Finally, the obtained results were compared with baseline approaches, and the proposed Multi-feature fusion framework attained the average detection precision between 11.2% - 27.1% compared to the baseline methods. The comparison outcomes show the significance of the proposed framework for sarcasm identification. Thus, the data sparsity issue can be resolved by selecting the discriminative features from the sparse training set before the modelling phase and bolstering the content-based feature with contextual information can enhance the predictive performance of sarcasm classification in textual data.

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) – Faculty of Computer Science & Information Technology, Universiti Malaya, 2021.
      Uncontrolled Keywords: Sarcasm identification; Twitter; Machine learning; Feature fusion; Natural language processing
      Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
      Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
      Divisions: Faculty of Computer Science & Information Technology > Dept of Information System
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 26 Jun 2023 06:15
      Last Modified: 26 Jun 2023 06:15

      Actions (For repository staff only : Login required)

      View Item