Feature extraction from natural language to aid requirements reuse in software product lines engineering / Noor Hasrina Bakar

Noor Hasrina, Bakar (2016) Feature extraction from natural language to aid requirements reuse in software product lines engineering / Noor Hasrina Bakar. PhD thesis, University of Malaya.

PDF (Thesis PhD)
Download (8Mb) | Preview


    Product Lines Engineering (SPLE) is a systematic approach towards realising software reuse. Among important software assets to be reused includes architectural documents, test cases, source codes and also requirements. Requirements Reuse (RR) in SPLE is the process of systematically reusing previously defined requirements for an earlier software product and applying them to a new, slightly different product within similar domain. SRS documents are not easily accessible, therefore many researchers in this area opted to use other forms of requirements including product brochures, user manuals and software reviews when SRS is not available. Unfortunately, to extract reusable features from natural language requirements for reuse is not easy. This task if done manually can be very complicated, expensive, and error-prone on the results. Many research efforts in SPLE focused on issues related to architectures, designs and code reuse, but research on requirements reuse has received slightly less attention from researchers and practitioners. Results from an exploratory survey gathered among SE practitioners indicated that the main impediments for RR practice includes the unavailability of RR tools or models for adoption, the conditions of existing requirements to be reused (incomplete, poorly structured, or not kept updated), and the lack of awareness among software practitioners pertaining to the systematic RR. Additionally, a Systematic Literature Review (SLR) conducted for feature extraction approaches for RR in SPLE reveals that there is a mixture of automated and semi-automated approaches from data mining and information retrieval, with only some approaches coming with support tools. This SLR also reveals that most of the support tools proposed in the selected studies are not made available publicly and thus making it hard for practitioners’ adoption. Motivated by these findings, this research proposes a process model for feature extractions from natural language requirements for reuse (FENL). FENL consists of four main phases: Accessing Requirements, Terms Extraction, Feature Identification and Formation of Feature Model. The proposed model is demonstrated through lab experiment and online software reviews are used as the input. In phase 1, software reviews are fetched from the Internet. Then, in phase 2, these reviews undergo text pre-processing stage. In phase 3, Latent Semantic Analysis (LSA) and tfidf term weighting are used in order to determine document relatedness. Then, linguistic tagging is applied to extract software features followed by applying simple clustering algorithms to form groups of common features. In phase 4, the common features that are grouped together are passed to the feature modelling process and manual feature diagram are constructed as the final output. The extraction results from the proposed semi-automated extraction is compared with the one obtained by the manual extraction procedure performed by teachers and software practitioner. Comparisons are made in terms of accuracy metrics (precision, recall and F-Measure), and time efficiency. The proposed approach obtained a recall of up to 85.95% (78.03% average) and a precision of up to 80.16% (58.63% average), when evaluated against the truth data set created manually. Additionally, when comparing with the related works, FENL results to obtain a comparable FMeasure.

    Item Type: Thesis (PhD)
    Additional Information: Thesis (PhD) - Faculty of Computer Science and Information Technology, University of Malaya, 2016.
    Uncontrolled Keywords: Software reuse; Product Lines Engineering (SPLE); Accessing Requirements; Terms Extraction; Feature Identification; Formation of Feature Model
    Subjects: Q Science > QA Mathematics > QA76 Computer software
    T Technology > T Technology (General)
    Divisions: Faculty of Computer Science & Information Technology
    Depositing User: Miss Dashini Harikrishnan
    Date Deposited: 10 Sep 2016 16:53
    Last Modified: 05 Mar 2019 08:32
    URI: http://studentsrepo.um.edu.my/id/eprint/6674

    Actions (For repository staff only : Login required)

    View Item