Data Distribution Aware Classification Algorithm based on K-Means

Abstract

Giving data driven decisions based on precise data analysis is widely required by different businesses. For this purpose many different data mining strategies exist. Nevertheless, existing strategies need attention by researchers so that they can be adapted to the modern data analysis needs. One of the popular algorithms is K-Means. This paper proposes a novel improvement to the classical K-Means classification algorithm. It is known that data characteristics like data distribution, high-dimensionality, the size, the sparseness of the data, etc. have a great impact on the success of the K-Means clustering, which directly affects the accuracy of classification. In this study, the K-Means algorithm was modified to remedy the algorithm’s classification accuracy degradation, which is observed when the data distribution is not suitable to be clustered by data centroids, where each centroid is represented by a single mean. Specifically, this paper proposes to intelligently include the effect of variance based on the detected data distribution nature of the data. To see the performance improvement of the proposed method, several experiments were carried out using different real datasets. The presented results, which are achieved after extensive experiments, prove that the proposed algorithm improves the classification accuracy of KMeans. The achieved performance was also compared against several recent classification studies which are based on different classification schemes.

Authors and Affiliations

Tamer Tulgar, Ali Haydar, Ibrahim Ersan

Keywords

Related Articles

 A Hybrid Technique Based on Combining Fuzzy K-means Clustering and Region Growing for Improving Gray Matter and White Matter Segmentation

 In this paper we present a hybrid approach based on combining fuzzy k-means clustering, seed region growing, and sensitivity and specificity algorithms to measure gray (GM) and white matter (WM) tissue. The propose...

Performance model to predict overall defect density

Management by metrics is the expectation from the IT service providers to stay as a differentiator. Given a project, the associated parameters and dynamics, the behaviour and outcome need to be predicted. There is lot of...

A hybrid Evolutionary Functional Link Artificial Neural Network for Data mining and Classification

This paper presents a specific structure of neural network as the functional link artificial neural network (FLANN). This technique has been employed for classification tasks of data mining. In fact, there are a few stud...

A Review of Blockchain based Educational Projects

Blockchain is a decentralized and shared dis-tributed ledger that records the transaction history done by totally different nodes within the whole network. The technology is practically used in the field of education for...

Detection of Violations in Credit Cards of Banks and Financial Institutions based on Artificial Neural Network and Metaheuristic Optimization Algorithm

Due to popularity of the World Wide Web and e-commerce, electronic communications between people and different organizations through virtual world of the Internet have provided a good basis for commercial and economic re...

Download PDF file
  • EP ID EP261189
  • DOI 10.14569/IJACSA.2017.080946
  • Views 85
  • Downloads 0

How To Cite

Tamer Tulgar, Ali Haydar, Ibrahim Ersan (2017). Data Distribution Aware Classification Algorithm based on K-Means. International Journal of Advanced Computer Science & Applications, 8(9), 328-334. https://europub.co.uk./articles/-A-261189