Amini, Amineh (2014) An adaptive density-based method for clustering evolving data streams / Amineh Amini. PhD thesis, University of Malaya.
| PDF Download (21Mb) | Preview | |
| PDF (Full Text) Download (70Kb) | Preview |
Abstract
Density-based method has emerged as a worthwhile class for clustering data streams. It has the abilities to discover clusters of arbitrary shapes, handle noise, and cluster without prior knowledge of number of clusters. The characteristics of data stream includes infinite volume, dynamically changing, allowing only one or a small number of scans, and demanding fast response time. Due to these characteristics the traditional densitybased clustering is not applicable. Recently, a number of density-based algorithms have been developed for clustering data streams. However, existing density-based data stream clustering algorithms are not without problems. The first problem refers to the high computation time required for the clustering process. The second problem is the dramatic decrease in the quality of clustering when there is a range in density of data. In this research, these problems are taken into account and a new method is proposed. This study proposes a density-based algorithm for clustering evolving data streams. The proposed method, which is called MuDi-Stream (Multi Density clustering algorithm for evolving data Stream), is an online-offline algorithm with four main components. Three of components are applied in the online phase while the other one is used in the offline phase. The prominent tasks of these components are keeping synopsis information, pruning these information, and forming final clusters. In the first component, a hybrid method comprised of density grid and micro clustering techniques is applied to maintain summary information in the form of core mini clusters while mapping outlier to the grids. The data points inside the grid form a new core mini cluster in case it reaches a density threshold in the second component. Furthermore, grid and core mini clusters are pruned using a pruning technique in the last component of online phase in order to keep the memory limited. A new multi density-based clustering method forms final clusters using both summarized synopsis information and statistical information. The quality of the algorithm is comprehensively evaluated on various synthetic and real datasets with different characteristics using variety of quality metrics. The complexity analysis shows that it uses limited time and memory which makes MuDi-Stream applicable for data stream. Furthermore, the scalability results prove that the proposed algorithm is scalable in terms of both dimension and number of clusters. Finally, the experimental results show that the proposed method in this study improves clustering quality in multi-density environments while minimizing the computation time.
Item Type: | Thesis (PhD) |
---|---|
Additional Information: | Thesis (Ph.D.) -- Fakulti Sains Komputer dan Teknologi Maklumat, Universiti Malaya, 2014. |
Uncontrolled Keywords: | Density-based method; Clustering data streams |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science T Technology > T Technology (General) |
Divisions: | Faculty of Computer Science & Information Technology |
Depositing User: | Mrs Nur Aqilah Paing |
Date Deposited: | 04 Mar 2015 13:11 |
Last Modified: | 04 Mar 2015 13:11 |
URI: | http://studentsrepo.um.edu.my/id/eprint/4684 |
Actions (For repository staff only : Login required)
View Item |