Investigate the Performance of Document Clustering Approach Based on Association Rules Mining

Abstract

The challenges of the standard clustering methods and the weaknesses of Apriori algorithm in frequent termset clustering formulate the goal of our research. Based on Association Rules Mining, an efficient approach for Web Document Clustering (ARWDC) has been devised. An efficient Multi-Tire Hashing Frequent Termsets algorithm (MTHFT) has been used to improve the efficiency of mining association rules by targeting improvement in mining of frequent termset. Then, the documents are initially partitioned based on association rules. Since a document usually contains more than one frequent termset, the same document may appear in multiple initial partitions, i.e., initial partitions are overlapping. After making partitions disjoint, the documents are grouped within the partition using descriptive keywords, the resultant clusters are obtained effectively. In this paper, we have presented an extensive analysis of the ARWDC approach for different sizes of Reuters datasets. Furthermore the performance of our approach is evaluated with the help of evaluation measures such as, Precision, Recall and F-measure compared to the existing clustering algorithms like Bisecting K-means and FIHC. The experimental results show that the efficiency, scalability and accuracy of the ARWDC approach has been improved significantly for Reuters datasets.

Authors and Affiliations

Noha Negm, Mohamed Amin, Passent Elkafrawy, Abdel M. Salem

Keywords

Related Articles

Developing a Candidate Registration System for Zambia School Examinations using the Cloud Model

Cloud computing has in the recent past gained a lot of ground in this digital age. The use of cloud technologies in business has broken barriers in sharing information making the world one big global village. Regardless...

Fairness Enhancement Scheme for Multimedia Applications in IEEE 802.11e Wireless LANs

Multimedia traffic should be transmitted to a receiver within the delay bound. The traffic is discarded when breaking its delay bound. Then, QoS (Quality of Service) of the traffic and network performance are lowered. Th...

Application of the Hierarchy Analysis Method at the Foodstuffs Quality Evaluation

In Russia as well as in the other countries of the world national programs are implemented to improve the health of the population. An integral part of those programs are measures of improvement of food processes structu...

An Efficient Scheme for Real-time Information Storage and Retrieval Systems: A Hybrid Approach

Information storage and retrieval is the fundamental requirement for many real-time applications. These systems demand that data should be sorted all the time, real-time insertion, deletion and searching should be suppor...

Developing A Model to Predict the Occurrence of the Cardio-Cerebrovascular Disease for the Korean Elderly using the Random Forests Algorithm

This study aimed to develop a model for predicting the cardio-cerebrovascular disease of the South Korean elderly using the random forests technique. This study analyzed 2,111 respondents (879 males and 1,232 females), w...

Download PDF file
  • EP ID EP136006
  • DOI 10.14569/IJACSA.2013.040820
  • Views 145
  • Downloads 0

How To Cite

Noha Negm, Mohamed Amin, Passent Elkafrawy, Abdel M. Salem (2013). Investigate the Performance of Document Clustering Approach Based on Association Rules Mining. International Journal of Advanced Computer Science & Applications, 4(8), 142-151. https://europub.co.uk./articles/-A-136006