Use of web page credibility information in increasing the accuracy of web-based question answering systems / Asad Ali Shah

Asad, Ali Shah (2017) Use of web page credibility information in increasing the accuracy of web-based question answering systems / Asad Ali Shah. PhD thesis, University of Malaya.

[img] PDF (The Candidate's Agreement)
Restricted to Repository staff only

Download (1758Kb) | Request a copy
    [img]
    Preview
    PDF (Thesis PhD)
    Download (6Mb) | Preview

      Abstract

      Question Answering (QA) systems offer an efficient way of providing precise answers to questions asked in natural language. In the case of Web-based QA system, the answers are extracted from information sources such as Web pages. These Web-based QA systems are effective in finding relevant Web pages but either they do not evaluate credibility of Web pages or they evaluate only two to three out of seven credibility categories. Unfortunately, a lot of information available over the Web is biased, false and fabricated. Extracting answers from such Web pages leads to incorrect answers, thus decreasing the accuracy of Web-based QA systems and other system relying on Web pages. Most of the previous and recent studies on Web-based QA systems focus primarily on improving Natural Language Processing and Information Retrieval techniques for scoring answers, without conducting credibility assessment of Web pages. This research proposes a credibility assessment algorithm for evaluating Web pages and using their credibility score for ranking answers in Web-based QA systems. The proposed credibility assessment algorithm uses seven categories for scoring credibility, including correctness, authority, currency, professionalism, popularity, impartiality and quality, where each category consists of one or more credibility factors. This research attempts to improve accuracy in Web-based QA systems by developing a prototype Web-based QA system, named Optimal Methods QA (OMQA) system, which uses methods producing highest accuracy of answers, and improving the same by adding a credibility assessment module, called Credibility-based OMQA (CredOMQA) system. Both OMQA and CredOMQA systems have been evaluated with respect to accuracy of answers, using two quantitative evaluation metrics: 1) Percentage of queries correctly answered and 2) Mean Reciprocal Rank evaluation metrics. Extensive quantitative experiments and analyses have been conducted on 211 factoid questions taken from TREC QA track from 1999, 2000 and 2011 and a random sample of 21 questions from CLEF QA track for comparison and conclusions. Results from methods and techniques evaluation show that some techniques improved accuracy of answers retrieved more than others performing the same function. In some cases, combination of different techniques produced higher accuracy of answers retrieved than using them individually. The inclusion of Web pages credibility score significantly improved accuracy of the system. Among the seven credibility categories, four categories including correctness, professionalism, impartiality and quality had a major impact on accuracy of answer, whereas authority, currency and popularity played a minor role. The results conclusively establish that proposed CredOMQA performs better than other Web-based QA systems. Not only that, it also outperforms other credibility-based QA systems, which employ credibility assessment partially. It is expected that these results will help researchers/experts in selecting Web-based QA methods and techniques producing higher accuracy of answers retrieved, and evaluate credibility of sources using credibility assessment module to improve accuracy of existing and future information systems. The proposed algorithm can also help in designing credibility-based information systems in the areas of education, health, stocks, networking and media, requiring accurate and credible information, and would help enforce new Web-publishing standards, thus enhancing overall Web experience.

      Item Type: Thesis (PhD)
      Additional Information: Thesis (PhD) – Faculty of Computer Science & Information Technology, University of Malaya, 2017.
      Uncontrolled Keywords: Web page; Web-publishing standards; Question Answering (QA) systems; Information sources; Credibility information
      Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
      Divisions: Faculty of Computer Science & Information Technology
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 23 May 2018 10:18
      Last Modified: 23 May 2018 10:18
      URI: http://studentsrepo.um.edu.my/id/eprint/7815

      Actions (For repository staff only : Login required)

      View Item