Mining stack overflow to recommend Java API classes using word embedding and topic modelling / Lee Wai Keat

Lee , Wai Keat (2019) Mining stack overflow to recommend Java API classes using word embedding and topic modelling / Lee Wai Keat. Masters thesis, Universiti Malaya.

[img]
Preview
PDF (The Candidate's Agreement)
Download (219Kb) | Preview
    [img] PDF (Thesis M.A)
    Download (1780Kb)

      Abstract

      To reduce development effort, today’s software development technologies rely heavily on reusable components provided by Application Programming Interfaces (APIs). However, studies have found that APIs are of poor usability and programmers find it difficult to use them. A number of factors affect the usability and learning of an API. The most critical one is the API documentation. Therefore, it is unsurprising that developers look for alternative information sources to learn APIs. One such sources is the crowd documentation of APIs that are available in Community Question and Answer (CQA) websites, such as Stack Overflow (SO). Studies have shown that the large volume of data in SO make it suitable for data mining and analytics for APIs. Following that, this research aims to: 1) identify Java programmers’ common Java programming problems based on their level of expertise, by analyzing Java-related duplicate discussion posts in SO (Study 1); 2) to address the lexical gap between natural language queries and Java APIs documentation, and the lexical gap between natural language queries and the Java programming codes, by designing and implementing an approach for recommending Java API classes for programmers’ natural language queries using data mined from SO (Study 2). Existing studies have found that SO questions/discussion posts have a wide coverage on Java API. Java was chosen in this research as it is a long established and popular programming language. Study 1 found that the novice group is the top contributor and the expert group contributes significantly lower to duplicate questions asked in SO, and the most common problem Java programmers face is understanding and/or fixing errors but expert programmers’ question more about the reasons behind some Java programming concepts. The proposed approach in Study 2 employs Natural Language Processing techniques, namely, word embedding and topic modelling, and heuristic rules to produce the Java API classes recommendations. The benchmarking of the performance of the proposed approach against existing state-of-the-art approach using four metrics (Top-K accuracy, Mean Recall @ K, Mean Reciprocal Rank @ K and Mean Average Precision @ K) shows that the proposed approach performs better. The proposed approach was implemented in a Java API classes recommender running on a server and an Eclipse IDE’s plug-in (APIRecJ) was implemented as the front-end to access the recommender’s functionalities. The results of the user evaluation study show that APIRecJ is generally useful in searching for Java API classes relevant to the programmers’ queries. In summary, the contribution of this research are: a set of common Java programming problems and Java API classes that Java programmers struggle with, that Java educators and learning resources can devote more attention to; an approach for recommending relevant Java API classes for programmers’ queries that outperforms existing approaches; a Java API classes recommender; and an Eclipse IDE’s plug-in that provides assistance on Java API classes relevant to the programmers’ queries within the IDE.

      Item Type: Thesis (Masters)
      Additional Information: Dissertation (M.A.) – Faculty of Computer Science & Information Technology, Universiti Malaya, 2019.
      Uncontrolled Keywords: Mining stack overflow; Java API class recommender; Word embedding; Topic modelling
      Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
      Q Science > QA Mathematics > QA76 Computer software
      Divisions: Faculty of Computer Science & Information Technology
      Depositing User: Mr Mohd Safri Tahir
      Date Deposited: 24 Mar 2022 06:30
      Last Modified: 24 Mar 2022 06:30
      URI: http://studentsrepo.um.edu.my/id/eprint/13077

      Actions (For repository staff only : Login required)

      View Item