An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 2

Abstract

 Abstract: Data Mining is considered as a nontrival extraction of implicit, previously unknown and potentially usefull information from data. Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. A novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation a novel pruning strategy, capable of significant gains of the unoptimized version of the algorithm, is presented. Through experiments Bayesian Network proves that it can achieve high precision and recall scores in several data sets. XMLDup is also able to outperform another state-of-the-art duplicate detection solution, both in terms of upto 80% and of effectiveness.

Authors and Affiliations

A. Baladhandayutham, , S. Roselin Mary

Keywords

Related Articles

 Low Energy Adaptive Clustering Hierarchy based routing Protocols Comparison for Wireless Sensor Networks

 Abstract: As A Result Of Recent Advances In Microelectronic System Fabrication, Progress In Ad-Hoc Networking Routing Protocols, Integrated Circuit Technologies, Wireless Communications, Microprocessor Hardware And...

Low-Cost Bus Tracking System Using Area-Trace Algorithm

This paper proposes a low-cost Bus Tracking System that uses GPS and allows the user to know the bus location on their mobile phone. It has an advantage that it does not use any internet data for the users to be,notified...

 A Study on A Hybrid Approach of Genetic Algorithm & Fuzzy To Improve Anomaly or Intrusion

Abstract: This paper describes a technique of applying Genetic Algorithm (GA) and fuzzy to network Intrusion Detection Systems (IDSs). A brief overview of a hybrid approach of genetic algorithm and fuzzy to improve anoma...

Adaptive Digital Filter Design for Linear Noise Cancellation Using Neural Networks

Abstract: Noise is the most serious issue in the filters and adaptive filters are subjected to this unwanted component. This paper deals with the problem of the adaptive noise and various adaptive algorithms functions wh...

 Performance Analysis of Different Clustering Algorithm

 Abstract: Clustering is the process of grouping objects into clusters such that the objects from the sameclusters are similar and objects from different clusters are dissimilar. The relationship is often expressed...

Download PDF file
  • EP ID EP142076
  • DOI 10.9790/0661-1628101105
  • Views 129
  • Downloads 0

How To Cite

A. Baladhandayutham, , S. Roselin Mary (2014).  An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data. IOSR Journals (IOSR Journal of Computer Engineering), 16(2), 101-105. https://europub.co.uk./articles/-A-142076