An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 2

Abstract

 Abstract: Data Mining is considered as a nontrival extraction of implicit, previously unknown and potentially usefull information from data. Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. A novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation a novel pruning strategy, capable of significant gains of the unoptimized version of the algorithm, is presented. Through experiments Bayesian Network proves that it can achieve high precision and recall scores in several data sets. XMLDup is also able to outperform another state-of-the-art duplicate detection solution, both in terms of upto 80% and of effectiveness.

Authors and Affiliations

A. Baladhandayutham, , S. Roselin Mary

Keywords

Related Articles

 Optimization of Horizontal Aggregation in SQL by using C4.5 Algorithm and K-Means Clustering

 Abstract: Datasets in the horizontal aggregated layout are preferred by most of data mining algorithms, machine learning algorithm. Major efforts are required to compute data in the horizontal aggregated format. Th...

A Survey on Digital Image Authentication by DCT and RPM Based Watermarking

Abstract: An image watermarking, data is embedded into cover media to prove ownership. Various Watermarking techniques are proposed by several authors within the last many years that embody spatial domain and transform d...

 An Extended Approach for Online Testing of Reversible Circuits

 Reversible computing has tremendous benefits in terms of power consumption, less heat dissipation and packaging density. Because its applications are found in diverse fields including quantum computing, nanotech...

 CEET: A Compressed Encrypted & Embedded Technique for Digital Image Steganography

 Abstract: In this information era, digital information sharing and transfer plays a vital role and their use has increased exponentially with the development of technology. Thus providing security of data is a topi...

 Identifying Threats Associated With Man-In-The-Middle Attacks during Communication between a Mobile Device and the Back End Server in Mobile Banking Applications

 Mobile banking, sometimes referred to as M-Banking, Mbanking or SMS Banking, is a term used for performing balance checks, account transactions, payments, credit applications and other banking transactions throug...

Download PDF file
  • EP ID EP142076
  • DOI 10.9790/0661-1628101105
  • Views 124
  • Downloads 0

How To Cite

A. Baladhandayutham, , S. Roselin Mary (2014).  An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data. IOSR Journals (IOSR Journal of Computer Engineering), 16(2), 101-105. https://europub.co.uk./articles/-A-142076