Mat @ Mohamad, Nurul Jannah (2014) De Novo assembly of an unknown geminivirus / Nurul Jannah Binti Mat @ Mohamad. Masters thesis, University of Malaya.
Abstract
Next-generation sequencing (NGS) also known as high throughput sequencing is now fast and cheap enough to be considered part of the toolbox for investigating the unknown virus. Illumina Genome Analyzer is one of the developed next-generation sequencing platforms that produce a significant larger volume of sequence data. The short sequence reads generated from Illumina Genome Analyzer can be used to perform de novo assembly. Therefore, this study was conducted to perform de novo assembly of an unknown geminivirus using the sequence reads generated from Illumina Genome Analyzer. In this study, the de novo assembly was carried out using SOAPdenovo and it indicates that only one scaffold (C11095) that mapped into the geminivirus genomes. After the scaffold output was obtained, the gene was predicted using GeneMark.hmm. There were 5 open reading frames (ORFs) predicted as gene. The function of each predicted gene was annotated using three different annotation tools, InterPro, Gene Ontology (GO) and UniProt. For example, from the InterPro result, the gene 1 encodes the geminivirus AL3 coat protein, while the UniProt result shows that the gene 1 encodes the replication enhancement protein and the GO shows that the gene 1 was involved in the viral process (biological process). In this study, the predictive genes were compared with the geminivirus genomes using BRIG (BLAST Ring Image Generator). The BRIG image shows that the large sequence of the unknown geminivirus was missing between 1000 bp until 1300 bp. From the genes comparison result, it indicates the similarity between the unknown geminivirus and the geminivirus genomes where all the geminiviruses encode the coat protein and replication-associated protein. The differences between the unknown geminivirus and the geminivirus genomes were the unknown geminivirus encodes the replication enhancement protein (gene 1), the hypothetical protein (gene 3) and the glyoxylate carboligase (gene 5). The phylogenetic result shows that the geminiviruses can be classified into the East Asia (China, Taiwan, and Japan) and the Southeast Asia (Malaysia, Indonesia, Philippines and Vietnam) viruses. The unknown geminivirus (candidate virus) was located in the Southeast Asia group. This phylogenetic tree indicates that the unknown geminivirus share common ancestor with Tobacco leaf curl Indonesia virus C1, V2, V1 genes for replication-associated protein, putative V2 protein, coat protein, partial and complete cds. The results of the phylogenetic tree suggest that the unknown geminivirus could be a Southeast Asia strain and it could be attack tobacco plants. The main point of this study was carried out to show the process in identifying an unknown sequence reads generated from Illumina Genome Analyzer.
Actions (For repository staff only : Login required)