CluSandra: A Framework and Algorithm for Data Stream Cluster Analysis

Abstract

The clustering or partitioning of a dataset’s records into groups of similar records is an important aspect of knowledge discovery from datasets. A considerable amount of research has been applied to the identification of clusters in very large multi-dimensional and static datasets. However, the traditional clustering and/or pattern recognition algorithms that have resulted from this research are inefficient for clustering data streams. A data stream is a dynamic dataset that is characterized by a sequence of data records that evolves over time, has extremely fast arrival rates and is unbounded. Today, the world abounds with processes that generate high-speed evolving data streams. Examples include click streams, credit card transactions and sensor networks. The data stream’s inherent characteristics present an interesting set of time and space related challenges for clustering algorithms. In particular, processing time is severely constrained and clustering algorithms must be performed in a single pass over the incoming data. This paper presents both a clustering framework and algorithm that, combined, address these challenges and allows end-users to explore and gain knowledge from evolving data streams. Our approach includes the integration of open source products that are used to control the data stream and facilitate the harnessing of knowledge from the data stream. Experimental results of testing the framework with various data streams are also discussed.

Authors and Affiliations

Jose R. Fernandez , Eman M. El-Sheikh

Keywords

Related Articles

A Novel Modeling based Agent Cellular Automata for Advanced Residential Mobility Applications

Nowadays, residential mobility (RM) is usually interconnected with other urban phenomena to give more realistic and effective to the simulation models in order to support urban planners and decision makers. Recent RM res...

Study and Analysis of Delay Sensitive and Energy Efficient Routing Approach

Wireless Sensing Networks (WSNs) comprised of significant numbers of miniatures and reasonable sensor nodes, which sense data from surrounding and forwarded data toward the base station (BS) via multi-hop fashion through...

Knowledge Management Strategyfor SMEs

In Thailand, as in other developing countries, the focus was on the large industry first, since governments assumed that large enterprises could generate more employment. However, there has been a realization that the SM...

Bound Model of Clustering and Classification (BMCC) for Proficient Performance Prediction of Didactical Outcomes of Students

In this era of High-Performance High computing systems, Large-scale Data Mining methodologies in the field of education have become a convenience to discover and extract knowledge from Databased of their respective educa...

Factors Influencing Users’ Intentions to Use Mobile Government Applications in Saudi Arabia: TAM Applicability

M-government applications in Saudi Arabia are still at an early stage. In this study, a modified technology acceptance model (TAM) was used to identify and measure the factors that influence users’ intentions to use m-go...

Download PDF file
  • EP ID EP113861
  • DOI -
  • Views 121
  • Downloads 0

How To Cite

Jose R. Fernandez, Eman M. El-Sheikh (2011). CluSandra: A Framework and Algorithm for Data Stream Cluster Analysis. International Journal of Advanced Computer Science & Applications, 2(11), 87-99. https://europub.co.uk./articles/-A-113861