Hybrid genetic random forest algorithm for the identification of ISI-indexed articles / Mohammadreza Moohebat

Mohammadreza, Moohebat (2017) Hybrid genetic random forest algorithm for the identification of ISI-indexed articles / Mohammadreza Moohebat. PhD thesis, University of Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (1685Kb) | Request a copy
    [img]
    Preview
    PDF (Thesis PhD)
    Download (2962Kb) | Preview

      Abstract

      In the past, the growth of human knowledge was slow and limited. For instance, when an innovation was created in the 18th century in the UK, it took several months or even years for the news to reach other parts of the globe. The advent of more modern technology and the current educational structure has accelerated this growth. Today, human knowledge grows every hour, and it is more accessible than ever before. This speed of knowledge growth highlights the role of scientific manuscripts in spreading this valuable knowledge around the world. We can trust that articles published in scientific journals concentrate on the cutting edge of knowledge. Presently, most of these journals are published in the English language, but many scientists are not proficient in English. This leads to a high rejection rate for publications and the loss of good research and talent due to the use of inappropriate terms or syntactical style. Reviewing scientific articles for high-quality journals is time-consuming (some cases take up to a year). Furthermore, many inexperienced authors do not follow the scientific writing style of high-quality journals (ISI journals) and get rejected after waiting several months. Having a tool that advises authors whether their writing style is following ISI journal standards can be helpful and save time. In this research study, I proposed an automated system for detecting the similarity of an article with well-written academic writing by noticing various term forms. I chose to advance a novel classification technique to recognize the existing academic patterns. However, it was first necessary to be confident that the classification technique could handle this job. Moreover, the result of this section was essential for me as a benchmark. After ensuring that the classification technique was able to accomplish this work, Hybrid Genetic Random Forests (HGRF) was introduced as a new ensemble classifier based on a Random Forest algorithm, but altered slightly with some innovations. In order to measure performance of the proposed algorithm, evaluation was done by several independent UCI datasets and the results were compared with RF and some individual classifiers. In the final stage, it was tested by creating datasets for ISI and non-ISI papers and the result was promising. In most cases, HGRF successfully distinguished ISI articles from non-ISI articles.

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) – Faculty of Computer Science & Information Technology, University of Malaya, 2017.
      Uncontrolled Keywords: Hybrid Genetic Random Forests (HGRF); Scientific articles; ISI-indexed articles; Human knowledge
      Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
      Divisions: Faculty of Computer Science & Information Technology
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 25 May 2018 11:47
      Last Modified: 25 May 2018 11:48
      URI: http://studentsrepo.um.edu.my/id/eprint/7792

      Actions (For repository staff only : Login required)

      View Item