Data Distribution Aware Classification Algorithm based on K-Means

Abstract

Giving data driven decisions based on precise data analysis is widely required by different businesses. For this purpose many different data mining strategies exist. Nevertheless, existing strategies need attention by researchers so that they can be adapted to the modern data analysis needs. One of the popular algorithms is K-Means. This paper proposes a novel improvement to the classical K-Means classification algorithm. It is known that data characteristics like data distribution, high-dimensionality, the size, the sparseness of the data, etc. have a great impact on the success of the K-Means clustering, which directly affects the accuracy of classification. In this study, the K-Means algorithm was modified to remedy the algorithm’s classification accuracy degradation, which is observed when the data distribution is not suitable to be clustered by data centroids, where each centroid is represented by a single mean. Specifically, this paper proposes to intelligently include the effect of variance based on the detected data distribution nature of the data. To see the performance improvement of the proposed method, several experiments were carried out using different real datasets. The presented results, which are achieved after extensive experiments, prove that the proposed algorithm improves the classification accuracy of KMeans. The achieved performance was also compared against several recent classification studies which are based on different classification schemes.

Authors and Affiliations

Tamer Tulgar, Ali Haydar, Ibrahim Ersan

Keywords

Related Articles

Pattern Visualization Through Detection Plane Generation for Macroscopic Imagery

Macroscopic images are kind of environments in which complex patterns are present. Satellite images are one of these classes where many patterns are present. This fact reflects the challenges in detecting patterns presen...

A Distributed Method to Localization for Mobile Sensor Networks based on the convex hull

There has been recently a trend of exploiting the heterogeneity in WSNs and the mobility of either the sensor nodes or the sink nodes to facilitate data dissemination in WSNs. Recently, there has been much focus on mobil...

 Clustering and Bayesian network for image of faces classification

  In a content based image classification system, target images are sorted by feature similarities with respect to the query (CBIR). In this paper, we propose to use new approach combining distance tangent, k-m...

Analyzing the Changes in Online Community based on Topic Model and Self-Organizing Map

In this paper, we propose a new model for two purposes: (1) discovering communities of users on social networks via topics with the temporal factor and (2) analyzing the changes in interested topics and users in communit...

Using Space Syntax and Information Visualization for Spatial Behavior Analysis and Simulation

This study used space syntax to discuss user movement dynamics and crowded hot spots in a commercial area. Moreover, it developed personas according to its onsite observations, visualized user movement data, and performe...

Download PDF file
  • EP ID EP261189
  • DOI 10.14569/IJACSA.2017.080946
  • Views 97
  • Downloads 0

How To Cite

Tamer Tulgar, Ali Haydar, Ibrahim Ersan (2017). Data Distribution Aware Classification Algorithm based on K-Means. International Journal of Advanced Computer Science & Applications, 8(9), 328-334. https://europub.co.uk./articles/-A-261189