Chong, Chun Yong (2016) Constrained clustering approach to aid in remodularisation of object-oriented software systems / Chong Chun Yong. PhD thesis, University of Malaya.
Abstract
Effective execution of software maintenance requires knowledge of the detailed working of software. The structure of a software, however, may not be clear to software maintainers because it is poorly designed or, worse, there is no updated software documentation. To effectively address this issue, researchers have proposed to apply software clustering to help in recovering a high-level semantic representation of the software design by grouping sets of collaborating software components into meaningful subsystems. This high-level semantic representation serves to help bridge the dichotomy between the perceived software design from the maintainers’ view and the actual code structure. However, software clustering is typically conducted in an unsupervised and rigid manner, where maintainers have no influence on the clustering results and only a single solution is produced for any given dataset. Even if maintainers possess additional information that could be useful to guide and improve the clustering results, traditional clustering algorithms have no way to take advantage of this information. These practical concerns have led the researcher to propose the idea of integrating domain knowledge into traditional unsupervised clustering algorithms, herewith referred as constrained clustering, a semi-supervised clustering technique where domain experts can explicitly exert their opinions in the form of explicit clustering constraints to restrict whether a pair of software components should or should not be clustered into the same subsystem. Apart from the explicit clustering constraints from domain experts, other sources of information to guide and improve clustering results can be derived implicitly from the source code itself. To help maintainers effectively identify and interpret the implicit information hidden in the source code, this study proposes representing software using weighted complex network in conjunction with graph theory to help in understanding and analysing the structure, behaviour, as well as the complexity of the software components and their iii relationships from the graph theory’s point of view. The results of the analysis can be subsequently converted into implicit clustering constraints. Hence, maintainers can make use of both the explicit and implicit constraints to help in creating a high-level semantic representation of the software design that is coherent and consistent with the actual code structure. This thesis proposes a constrained clustering approach to aid in remodularisation of poorly designed or poorly documented object-oriented software systems. The source code of an object-oriented software system is first converted into UML class diagrams. Next, information from the class diagrams are extracted to measure the strength of cohesion among related classes together with their relationships, and then transform them into a weighted complex network with its nodes and edges associated with measured weights. Graph theory metrics are subsequently applied onto the constructed weighted complex network so that the structure, behaviour, and the complexity of software components and their relationships can be analysed. The results are then converted into sets of clustering constraints. Guided by the explicit and implicit clustering constraints, sets of cohesive clusters are progressively derived to act as a high-level semantic representation of the software design. This research follows an empirical research methodology, where the proposed approach is validated using 40 object-oriented open-source software systems written in Java. Using MoJoFM, which is a well-established technique used to compare the similarity between multiple clustering results, the proposed approach achieves an aggregated average of 80.33% accuracy when compared against the original package diagrams of the 40 software systems, thus considerably outperforms conventional unconstrained clustering approach. The clustering results serve as supplementary information for software iv maintainers to aid in making critical decisions for re-engineering, maintaining and evolving software systems. Ultimately, this research helps in reducing the cost of software maintenance through better comprehension of the recovered software design.
Actions (For repository staff only : Login required)