Increasing the effectiveness of system-based evaluation for information retrieval systems / Prabha Rajagopal

Prabha , Rajagopal (2018) Increasing the effectiveness of system-based evaluation for information retrieval systems / Prabha Rajagopal. PhD thesis, University of Malaya.

PDF (The Candidate's Agreement)
Restricted to Repository staff only
Download (1591Kb)

Preview

PDF (Thesis PhD)
Download (10Mb) | Preview

Abstract

The information retrieval system evaluation is necessary to measure and quantify the effectiveness, assess user satisfaction and acceptance of the retrieval systems, and compare the performance of the retrieval systems. The relevance judgments, system rankings, and statistical significance testing are some essentials in the evaluation. This thesis makes several contributions to the information retrieval system evaluation using test collections in three different experiments. The first experiment explored issues in relation to effort needed by users to retrieve relevant contents from documents. Real users give up easily and do not put as much effort as expert judges while retrieving relevant contents. It is unknown if deeper evaluation and wider groups of systems show variation in system rankings due to effort. The experimentation aims to generate low effort relevance judgments systematically, determine the variation of system rankings evaluated at different depth and groups of systems, and explore the effectiveness in evaluating retrieval systems using low effort relevance judgment with reduced topic sizes. Low effort relevance judgments are generated using boxplot approach and standardized readability grades. The findings reveal variation on system rankings at various evaluation depths and groups of systems while reduced topic sizes evaluation shows differing outcome. The second experiment explored issues on reliability of system rankings. Evaluation of system rankings set indicates the overall reliability but not for individual systems. Evaluation by combination of metrics signifies its versatility in fulfilling different user models. The experimentation aims to propose an approach to evaluate the reliability of individual system rankings, determine suitable combination of metrics, understand generalization of system ranking reliability to other similar metrics, identify the original systems with reliable system rankings, and validate the proposed approach. The proposed intraclass correlation coefficient approach measures the reliability of individual system rankings using relative topic ranks. The average precision and rank-biased precision metrics are recommended for measuring reliability of individual system rankings. Most experimented metrics combinations generalize well. Highly reliable systems comprise of top and mid performing systems from the original systems ranking. Also, a strong correlation coefficient between system rankings of original and proposed approach validates the proposed reliability measurement of individual retrieval system rankings. The third experiment explored issues on the usage of averaged or cut-off topic scores for statistical significance testing. Precision at k metric causes varying user experience while the need for total relevant documents in average precision is infeasible on the ever-changing Web. The experimentation aims to propose an approach to overcome the inaccuracy of averaged or cut-off topic scores in statistical significance test, identify a suitable sample size, and validate the effectiveness of the proposed approach. The approach uses indivisible document-level scores for statistical significance testing. The document-level scores usage produced higher numbers of statistically significant system pairs compared to the existing method. Suitable sample size selection is necessary for achieving reliable results while a high percentage of agreement between the proposed and existing reveals the effectiveness of the proposed document-level approach.

Item Type:	Thesis (PhD)
Additional Information:	Thesis (PhD) - Faculty of Computer Science & Information Technology, University of Malaya, 2018.
Uncontrolled Keywords:	Information retrieval evaluation; System-oriented; Test collections; Batch experimentation; TREC
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:	Faculty of Computer Science & Information Technology
Depositing User:	Mr Mohd Safri Tahir
Date Deposited:	25 Jul 2019 04:25
Last Modified:	21 May 2021 04:31
URI:	http://studentsrepo.um.edu.my/id/eprint/8916

Actions (For repository staff only : Login required)

View Item