Computer vision-based vehicle recognition system using deep learning techniques / Tan Shi Hao

Tan , Shi Hao (2024) Computer vision-based vehicle recognition system using deep learning techniques / Tan Shi Hao. PhD thesis, Universiti Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (119Kb)
    [img] PDF (Thesis PhD)
    Download (3371Kb)

      Abstract

      Vehicle recognition is essential for Intelligent Transportation System (ITS) in creating a comfortable commuting environment. It is the enabler for a diverse range of applications, including roadway maintenance, surveillance systems, electronic tolls, etc. With the aim of improving vehicle type and vehicle make and model recognition (VMMR) performance, the past studies are collated and a vehicle taxonomy that encompasses sensor-based and Computer Vision (CV)-based solutions is deliberated. Motivated to learn superior convolution filters, the first proposal employs Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) as filter learning techniques. The proposed network dubbed PCA-LDA-Convolutional Neural Network (CNN) also incorporates a parameter-free Channel-Based Attention Module (ChBAM) to tune the feature responses guided by the channel information saliency. The framework delivers 99.6% and 97.8% accuracies on datasets with 30 and 300 vehicle models, respectively. The robustness tests verify that PCA-LDA-CNN is steadfast against image distortions. Secondly, the past studies reveal that neglecting the degree of informativeness cripples the quality of representation learning. In this regard, a Spatial Attention Module (SAM), which is empowered by Multi-Head Self-Attention (MHSA), is proposed to scale the feature responses by exploiting spatial relevancy. The proposed ResNet50-SAM model records exceptional performance on Beijing Institute of Technology (BIT)- Vehicle, Stanford Cars and Web-Nature Comprehensive Cars (CompCarsWeb) datasets by reporting 98.2%, 84.5% and 96.0% accuracies, respectively. A qualitative inspection of the feature embeddings suggests high cohesivity within the group. Integrating SAM into other CNNs also leads to considerable improvements. Next, forgoing the low-level details and concentrating on high-level features is detrimental to VMMR. The Cross Granularity (CG) module, in contrast, integrates both information to render a balanced mix of local contextual information and global semantic details. The combination of ResNet50 and CG module attains 98.6%, 95.4%, 86.4% and 99.1% accuracies on CompCarsWeb, Stanford Cars, Car-FG3K and Surveillance-Nature Comprehensive Cars (CompCarsSV) datasets, respectively. The qualitative analysis further unveils its strong ability to locate the distinctive fine-grained vehicle details. The CG module is also highly compatible with various backbone CNNs. As the fourth proposal, the Coarse-to-Fine Context Aggregation (CFCA) module presents a parameter-efficient multi-scale feature learning paradigm. The cross-scale features are generated by first refining the scalespecific components independently and then fusing them in a nonlinear manner through convolution. The multi-scale feature maps produce 98.0%, 95.1%, 86.2%, 99.0%, and 96.9% accuracies on CompCarsWeb, Stanford Cars, Car-FG3K, CompCarsSV and Mohsin-VMMR datasets, respectively. Moreover, the neurons exhibit high feature responses on the discriminative vehicle parts, corresponding to the superior feature extraction ability of the CFCA module. The fifth proposal presents an Augmented- Granularity (AG) module that executes grouped focus convolution (GFConv) to compose multi-granularity features. With the spatial-to-channel transformation, the GFConv doubles the receptive field whilst mitigating information loss. When pairing the AG module with TResNet-L, the network claims 87.8%, 95.5%, 98.6% and 92.5% on Car- FG3K, Stanford Cars, CompCarsWeb and VMMRdb datasets, respectively. The dissection of the feature embeddings affirms the ability of the AG module to reduce the intraclass variance. The AG module also brings 2.7% accuracy improvements in average for 4 backbone CNNs.

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) - Faculty of Engineering, Universiti Malaya, 2024.
      Uncontrolled Keywords: Attention; Convolutional neural network; Fine-grained visual classification; Multi-scale; Vehicle recognition
      Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering
      Divisions: Faculty of Engineering
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 20 Feb 2025 02:03
      Last Modified: 20 Feb 2025 02:03
      URI: http://studentsrepo.um.edu.my/id/eprint/15555

      Actions (For repository staff only : Login required)

      View Item