Static and adaptive indexing framework for big data using predictor logic / Aisha Siddiqa

Aisha , Siddiqa (2017) Static and adaptive indexing framework for big data using predictor logic / Aisha Siddiqa. PhD thesis, University of Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (227Kb)
    PDF (Thesis PhD)
    Download (2180Kb) | Preview


      Big data with exponential growth come in various forms and require efficient data processing systems for fast retrieval. The disrupted features that are associated with big data have elicited attention from research and industry; the research efforts aim to explore viable solutions that can improve data retrieval performance for better insight. Indexing has undoubtedly contributed to increased search performance for big data sets; for big data indexing, researchers have used many indexing structures such as clustered and non-clustered. However, because of the continuous increase in data size, contemporary big data indexing mechanisms are inadequate to achieve efficiency in query responses. Clustered indexing approaches are constrained to number of replicas to offer indexing on a sufficient number of attributes, whereas non-clustered indexing implementation incurs high indexing overhead. Therefore, existing big data indexing structures are unable to achieve the maximum index hit ratio. The aim of this study is to expedite the data retrieval process with minimum indexing overhead and maximum index hit ratio against search queries for big data by using non-clustered indexing approach. Static indexes are created based on a user-provided list of index attributes before starting query execution, which are updated adaptively based on changing query workload to obtain an increased index hit ratio. We investigate contemporary big data indexing implementation and analyze its inefficiency in index creation time and index size. Furthermore, we observe that because of the limited number of indexes available with clustered indexing approaches, most queries are executed without using indexes. Thus, we propose a novel indexing framework for big data, named SmallClient, with minimized indexing overhead, improved search performance, and improved index hit ratio. SmallClient leverages B-Tree indexing structure and uses novel predictor logic for indexing. We collected data for indexing overhead (both in terms of indexing time and index size) as well as search performance and index hit ratio for static and adaptive indexing, respectively, to validate the performance of the framework. We use benchmarking and mathematical modeling for verification of SmallClient results. The results of indexing time prove that SmallClient has decreased indexing time overhead by up to 32% from 47%, taken by the Lucene indexing library. Similarly, index size overhead is 41% for large data sets where Lucene fails to create indexes. The results also prove that the search performance of SmallClient is more than 92% without intervening data uploading cost and that this framework achieves improved index hit ratio by adaptively updating indexes.

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) – Faculty of Computer Science & Information Technology, University of Malaya, 2017.
      Uncontrolled Keywords: Predictor logic; Data processing systems; Indexing; SmallClient; Non-clustered indexing
      Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
      Q Science > QA Mathematics > QA76 Computer software
      Divisions: Faculty of Computer Science & Information Technology
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 09 Sep 2020 03:28
      Last Modified: 09 Sep 2020 03:28

      Actions (For repository staff only : Login required)

      View Item