Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak

Nur ‘ Ain , Mohd Ishak (2020) Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak. Masters thesis, Universiti Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (168Kb)
    PDF (Thesis M.A)
    Download (1548Kb) | Preview


      The advancement of next generation sequencing (NGS) technology has revolutionized the field of genomic and genetic studies. As compared to conventional methods, NGS generate comprehensive genomic data at a fraction of the cost with a higher percentage of accuracy. One of the processing and analyzing NGS data is genome assembly. De novo assembly is a process of assembling short reads into contiguous sections of sequence without a reference which is different with conventional mapping technique. De Bruijn graph is one of the assembly algorithms that are widely used for short reads sequences produced from NGS platforms. In this study, the performance of four de novo assemblers (SPAdes, ABySS, Velvet and MaSuRCA) is reported, in which variants of de Brujin graph algorithms are applied, using genomic data generated by the Illumina sequencing platform. The computational performance regarding the assemblers running time were compared. The assembled contigs and scaffolds were also evaluated based on several qualities specifically for their length and the contiguity of the assembly using ABySS-fac. Results showed that on single-end data sets, MaSuRCA, and SPAdes produced generally the best results among all the four assemblers with highest percentage of contigs that were equal or longer than 500 bp, highest total base pairs, highest N50 and the lowest L50 for most assemblers. For paired-end data sets, Velvet are suitable to assemble all the seven bacteria genome sequences. This comparative study will advance the current knowledge of de novo genome assembly as it is the first step toward characterizing and revealing whole genomic information. In addition, this work provides a practical guideline that could aid researchers in identifying the appropriate assembler(s) for their research projects.

      Item Type: Thesis (Masters)
      Additional Information: Dissertation (M.A.) – Faculty of Science, University Malaya, 2020.
      Uncontrolled Keywords: Next generation sequencing (NGS); de Novo assembly; de Bruijn graph; Illumina; Bacterial genome assemblers; Genomic information
      Subjects: Q Science > Q Science (General)
      Q Science > QH Natural history > QH301 Biology
      Divisions: Faculty of Science
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 15 Dec 2021 03:07
      Last Modified: 15 Dec 2021 03:07

      Actions (For repository staff only : Login required)

      View Item