Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem

Ibrahim Abaker , Targio Hashem (2017) Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem. PhD thesis, University of Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (1489Kb) | Request a copy
    [img] PDF (Thesis PhD)
    Download (1457Kb)

      Abstract

      With the fast development of Internet-based technologies, data generation has increased drastically over the past few years, coined as big data era. Big data offer a new paradigm shift in data exploration and utilization. The major enabler for underlying many big data platforms is certainly the MapReduce computational paradigm. Scheduling plays an important role in MapReduce, mainly in reducing the execution time of data-intensive jobs. However, despite recent efforts toward improving MapReduce performance, scheduling MapReduce jobs across multiple nodes have shown to be multi-objective optimization problem. The problem is even more complex using virtualized clusters in a cloud computing to execute a large number of tasks. The complexity lies in achieving multiple objectives that may be of conflicting nature. These conflicting requirements and goals are challenging to optimize due to the difficulty of predicting a new incoming job�s behavior and its completion time. In this study, we aim to optimize task scheduling and resource utilization using an evolutionary algorithm based on the proposed completion time and monetary cost of cloud service models. The multi-objective approaches which are, Sorting Genetic Algorithm II (NSGA-II) and Strength Pareto Evolutionary Algorithm II (SPEA2) are applied to find the Pareto front of the Makespan and total cost. The result of our experiment analysis reveals that the advantage of NSGA-II over the SPEA2 on the tested problems based on the adopted measuring criteria. In addition, NSGA-II algorithm was able to find the optimal solutions. We then proposed a multi-objective scheduling algorithm framework that considers resource allocation and task scheduling in a heterogonous cloud environment. The proposed algorithm is evaluated using tasks scheduling in the scheduling load simulator and validated using statistical modeling. The simulation results acquired from the experiments showed the effectiveness of the proposed framework and algorithm.

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) - Faculty of Computer Science & Information Technology, University of Malaya, 2017.
      Uncontrolled Keywords: Big data processing; Internet-based technology; MapReduce; Model
      Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
      Divisions: Faculty of Computer Science & Information Technology
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 06 May 2019 06:44
      Last Modified: 06 May 2019 06:44
      URI: http://studentsrepo.um.edu.my/id/eprint/9755

      Actions (For repository staff only : Login required)

      View Item