Information fusion and data augmentation with deep features for a deep learning-based baby cry recognition / Zhang Ke

Zhang , Ke (2024) Information fusion and data augmentation with deep features for a deep learning-based baby cry recognition / Zhang Ke. PhD thesis, Universiti Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (147Kb)
    [img] PDF (Thesis PhD)
    Restricted to Repository staff only until 31 December 2025.

    Download (2927Kb)

      Abstract

      Deep learning theory has made remarkable advancements in baby cry recognition, significantly enhancing its accuracy. Nonetheless, existing research faces three challenges. Firstly, the limited size of the database increases the risk of overfitting for a deep learning model. Secondly, the current research still suffers from data imbalance problem, which leads to bias in model learning. Thirdly, there is a limited study on information fusion. Therefore, the objectives of this study are firstly, to develop a model based on transfer learning to solve the model overfitting problem; secondly, to develop a generative adversarial network model to generate new baby cry data to solve the data imbalance problem; and thirdly to develop an information fusion method to improve the recognition accuracy. To address these issues, the contribution of this study is elaborated in the following three points. (1) A novel approach called BCRNet is proposed, which combines transfer learning and feature fusion. The BCRNet model takes multi-domain features as input and extracts deep features using a transfer learning model. Subsequently, a multilayer autoencoder is utilized for feature reduction, and a Support Vector Machine (SVM) is employed to select the transfer learning model with the highest classification accuracy. Then two features are concatenated to form fused features. Finally, the fused features are fed into a deep neural network (DNN) for classification. Experimental results show that the proposed model is effective in mitigating the model overfitting problem due to small datasets. The fused features of the proposed method are better than the existing methods using single domain features. (2) Sparse Autoencoder Long Short-Term Memory based Generative Adversarial Network (SLGAN) is proposed to solve the data imbalance problem. The proposed SLGAN model generates new baby cry data to ensure the number of samples for every cry class is equal. Speech features are extracted using Mel spectrograms and Short-Time Fourier Transform (STFT). Two deep learning models, i.e. VGG16 and VGG19 are used to extract the deep features. The deep features are then dimensionally reduced by using Principal Component Analysis (PCA). A sparse autoencoder model is used to fuse the deep features. Finally, the fused features are trained and classified using the DNN. The experimental results show that the proposed method outperforms the existing methods. (3) An improved Dempster-Shafer evidence theory (DST) based on Wasserstein distance and Deng entropy is proposed to solve the evidence conflict by combining the credibility degree between evidence and the uncertainty degree of evidence. To validate the effectiveness of the proposed method, examples are analyzed, and applied in baby cry recognition. The Whale optimization algorithm-Variational mode decomposition is used to optimally decompose the baby cry signals. The deep features of decomposed components are extracted using the VGG16 model. The long Short-Term Memory model is used to classify the baby cry signals. An improved DST decision method is used to obtain the decision fusion. The experiment results show that the proposed fusion method can reach the highest recognition accuracy of 90.15%, which is higher than the results of other studies.

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) - Faculty of Engineering, Universiti Malaya, 2024.
      Uncontrolled Keywords: Baby cry; Data imbalance; Overfitting; Feature fusion; Deep learning
      Subjects: R Medicine > RA Public aspects of medicine
      T Technology > TA Engineering (General). Civil engineering (General)
      Divisions: Faculty of Engineering
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 02 Sep 2024 06:55
      Last Modified: 02 Sep 2024 06:55
      URI: http://studentsrepo.um.edu.my/id/eprint/15371

      Actions (For repository staff only : Login required)

      View Item