Speech emotion recognition using bidirectional echo state network with random projection / Hemin Fatih Ibrahim

Hemin Fatih , Ibrahim (2022) Speech emotion recognition using bidirectional echo state network with random projection / Hemin Fatih Ibrahim. PhD thesis, Universiti Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (196Kb)
    [img] PDF (Thesis PhD)
    Restricted to Repository staff only until 31 December 2024.

    Download (2000Kb)


      Speech is an effective, quick, and important way of communicating and exchanging complex information between humans. Emotions have always been a part of the normal human conversation which makes the speech more attractive and more effective. Because of this major role of both speech and emotion in human life, many researchers are inspired by studying Speech Emotion Recognition (SER) and considered as a key effort in Human- Computer Interaction (HCI). The accurate SER system can have an effective role in several services, such as call center services, the in-car board, educational systems, and children in care. The major challenges in SER are, to catch and extract the most relevant emotion features from the raw speech signal with distinctive information and a robust and cheap computational model. The main focus of this thesis is to design a model for emotion recognition from speech signals, which still has plenty of challenges in the area, and adopt the most relevant features. This thesis tackles these challenges by providing a multivariate time series classification based on reservoir computing for detecting emotions from speech. Due to the time series and sparse nature of emotion in speech, the multivariate time series handcrafted feature has been adopted as input data. The bidirectional Echo State Network (ESN) which is a type of reservoir computing and as a special case of the Recurrent Neural Network (RNN) has been adopted to avoid model complexity because of its untrained and sparse nature when mapping the features into a higher dimensional space. Although the ESN has advantages, some problems still need to be solved, such as the instability with initializing fixed weights randomly and selecting the optimal value for hyperparameters which have a big impact on the ESN performance. Therefore, to address these issues in ESN, the bidirectional ESN with twin reservoirs is adopted to catch additional independent information from each direction. Additionally, the late fusion of the same direction from twin reservoirs leads to having a more informative representation and enhances the memorization capability for SER applications. The truncated normal distribution approach is exploited to initialize random connection weights for the input weight, in addition to optimizing the hyperparameters in the ESN model by Bayesian optimization and Population Based Training (PBT) approaches. Moreover, the high dimensional sparse output from a reservoir makes feature representation suffer from the curse of dimensionality, for that reason the Sparse Random Projection (SRP) is adopted for dimensionality reduction since it offers significant computational advantages because it does not need any training and removes redundancies with minimal loss of information. Experimental results of this thesis with a speaker-independent strategy achieved 89.21%, 70.48%, 76.76%, and 46.34% unweighted average recalls on the Emo-DB, SAVEE, RAVDESS, and FAU Aibo datasets respectively. The results show the superior performance of our proposed model over a set of other methods on four publicly available emotional speech datasets.

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) – Faculty of Computer Science & Information Technology, Universiti Malaya, 2022.
      Uncontrolled Keywords: Speech emotion recognition; Echo state network; Random projection; Recurrent neural network; Reservoir computing
      Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
      Divisions: Faculty of Computer Science & Information Technology > Dept of Artificial Intelligence
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 05 Jul 2023 07:00
      Last Modified: 05 Jul 2023 07:00
      URI: http://studentsrepo.um.edu.my/id/eprint/14562

      Actions (For repository staff only : Login required)

      View Item