A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES

Ling, Chien (2020) A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES. [Final Year Project Report] (Unpublished)

[img] PDF
Ling Chien - 24 pgs.pdf

Download (888kB)
[img] PDF (Please get the password from TECHNICAL & DIGITIZATION MANAGEMENT UNIT, ext: 082-583913/ 082-583914)
Ling Chien.pdf
Restricted to Registered users only

Download (2MB)

Abstract

Clustering is used to identify the intrinsic grouping of a set of unlabelled data. It can be applied in data mining exploration and statistical data analysis. The clustering technique plays an important role in the current digital environment. As the quality and complication of data on the internet are increasing in today’s rapidly evolving area, the clustering methods become the indispensable techniques to find the patterns of the data. There are many types of clustering techniques that have been developed included partitioning methods, hierarchical clustering, density-based clustering, model-based clustering, and fuzzy clustering. This study only focuses on three types of clustering techniques which are k-means clustering, agglomerative hierarchical clustering with the ward’s linkage, complete linkage, and average linkage, and Self-Organizing Map (SOM). The clustering algorithms are written using Python language by modifying the coding obtained from the Internet. In this project, experiments on visualisation and performance analysis of selected clustering methods are conducted. Besides that, a case study is conducted by implementing the clustering technique on online product reviews. The results for the experiment on visualisation of clustering methods, it showed that various clustering techniques have their visualisation for cluster analysis. Meanwhile, the results of the predictive accuracy indicated that k-means clustering and self-organizing map (SOM) are the most suitable techniques for cluster analysis. Based on the results of the case study, it concluded that the accuracy in clustering the online product reviews has the relationship with the structures and amount of the sentences. The extractive text summarisation with the clustering technique can be improved and further developed to imply in the customer review system as the correction between them have been known.

Item Type: Final Year Project Report
Additional Information: Project Report (BSc.) -- Universiti Malaysia Sarawak, 2020.
Uncontrolled Keywords: K-means Clustering, Agglomerative Hierarchical Clustering, SelfOrganizing Map (SOM), Extractive Text Summarisation with Clustering Technique.
Subjects: H Social Sciences > H Social Sciences (General)
Divisions: Academic Faculties, Institutes and Centres > Faculty of Cognitive Sciences and Human Development
Depositing User: Gani
Date Deposited: 24 Nov 2020 07:29
Last Modified: 24 Nov 2020 07:29
URI: http://ir.unimas.my/id/eprint/32941

Actions (For repository members only: login required)

View Item View Item