Implementation of K-Means Clustering Algorithm in Hadoop Framework

Abstract

Drastic growth of digital data is an emerging area of concern which has led to concentration of Data Mining technique. The actual data mining task involves programmatic or semi-programmatic analysis of large quantities of data to extract hidden interesting patterns such as groups of data records, which is referred as Cluster Analysis. Clustering is the partitioning of data items into different groups (clusters), so that the data objects of each cluster share common characteristics. Data collected in practical scenarios is more often than not completely random and unstructured or semi-structured. Hence, there is always a need for analysis of such data sets to derive meaningful hidden information. In this kind of scenarios, the unsupervised algorithms come in to picture to process unstructured or even semi structured data sets by resultant. Several clustering algorithms have been proposed in the past few years among which k-means clustering algorithm is one of the simplest and popular unsupervised learning algorithm that will solve the well-known clustering problem. K-means clustering algorithm produces a specific number of disjoint clusters. The k-means algorithm requires k initial cluster centers that must be specified beforehand and are randomly selected. This paper discusses the implementation of K-means algorithm in MapReduce programing model which is run on Hadoop distributive environment.

Authors and Affiliations

Uday Kumar S, Naveen D Chandavarkar

Keywords

Related Articles

Synthesis, Characterization and Magnetic Properties of Mn2+ Doped Cdga2-2xo4 Oxide Spinels

Mn2+ doped CdGa2-2xO4 oxide spinels with ‘x’ values ranging from 0.15, 0.30, 0.45, and 0.60 were synthesized by sol – gel method via nitrate citrate route. X-ray powder diffraction analysis confirms the presence of cubi...

Performance Analysis of VARS Using Exhaust Gas Heat of C.I Engine

In this paper the waste heat from C. I engine is suggested as one of the alternative energy source for refrigeration system. In this study, an overview of utilization of waste heat with a brief literature of the current...

slugDesign of Remote Monitoring System for the Detection of Vital Signs

Chronic heart failure (CHF) patients are commonly addicted to frequent hospital admissions. They should be constantly monitored for that sake they should be admitted in hospital. But frequent...

Brainwave and Alcohol Sensitising Helmet for Riders Safety

Two wheelers are widely used than other form of vehicles due to its low cost and simplicity. Most of the time rider doesn’t like to wear helmet which could be result in fatal accidents. Drunken driving and Drowsy drivin...

Partial Replacement of Cement with Marble Dust Powder in Cement Concrete

In this study, the investigation of performance on the strength parameter of concrete mix by partial replacing the cement with marble powder is done. Since marble powder employed in the various type of construction work...

Download PDF file
  • EP ID EP20614
  • DOI -
  • Views 316
  • Downloads 5

How To Cite

Uday Kumar S, Naveen D Chandavarkar (2015). Implementation of K-Means Clustering Algorithm in Hadoop Framework. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 3(5), -. https://europub.co.uk./articles/-A-20614