An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 2

Abstract

 Abstract: Data Mining is considered as a nontrival extraction of implicit, previously unknown and potentially usefull information from data. Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. A novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation a novel pruning strategy, capable of significant gains of the unoptimized version of the algorithm, is presented. Through experiments Bayesian Network proves that it can achieve high precision and recall scores in several data sets. XMLDup is also able to outperform another state-of-the-art duplicate detection solution, both in terms of upto 80% and of effectiveness.

Authors and Affiliations

A. Baladhandayutham, , S. Roselin Mary

Keywords

Related Articles

 An Improvement on Route Recovery by Using Triangular Fuzzy  Numbers on Route Errors in MANET

 Based on mobile nature in MANET, there is no doubt that all routing protocols have some route errors. Usually, routing protocols try to recover a route after a route error has been happened on an exact route &nbs...

 Efficient video watermarking with SWT and empirical PCAbased decoding

Abstract: Digital content piracy is one of the major crimes in the present world. Protection of digital content like music, video and images has become a major problem. Watermarking is one of the methods to protect digit...

 A Novel Edge Detection Technique for Image Classification and  Analysis

 The main aim of this project is to propose a new method for image segmentation. Image Segmentation is concerned with splitting an image up into segments (also called regions or areas) that each  holds some p...

 Modeling an Expert System for Diagnosis of Gestational Diabetes Mellitus Based On Risk Factors

Expert systems are recent product of artificial intelligence. It is a set of programs that manipulate encoded knowledge to solve problems in a specialized domain. Diabetes is a chronic illness that requires continuo...

 Comparative Performance Analysis of SALT and PEPPER Noise Removal

 Abstract: Noise is an important factor which when get added to an image reduces its quality and appearance.So in order to enhance the image qualities, it has to be removed with preserving the textural information a...

Download PDF file
  • EP ID EP142076
  • DOI 10.9790/0661-1628101105
  • Views 113
  • Downloads 0

How To Cite

A. Baladhandayutham, , S. Roselin Mary (2014).  An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data. IOSR Journals (IOSR Journal of Computer Engineering), 16(2), 101-105. https://europub.co.uk./articles/-A-142076