An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data
Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 2
Abstract
Abstract: Data Mining is considered as a nontrival extraction of implicit, previously unknown and potentially usefull information from data. Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. A novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation a novel pruning strategy, capable of significant gains of the unoptimized version of the algorithm, is presented. Through experiments Bayesian Network proves that it can achieve high precision and recall scores in several data sets. XMLDup is also able to outperform another state-of-the-art duplicate detection solution, both in terms of upto 80% and of effectiveness.
Authors and Affiliations
A. Baladhandayutham, , S. Roselin Mary
Low Energy Adaptive Clustering Hierarchy based routing Protocols Comparison for Wireless Sensor Networks
Abstract: As A Result Of Recent Advances In Microelectronic System Fabrication, Progress In Ad-Hoc Networking Routing Protocols, Integrated Circuit Technologies, Wireless Communications, Microprocessor Hardware And...
Low-Cost Bus Tracking System Using Area-Trace Algorithm
This paper proposes a low-cost Bus Tracking System that uses GPS and allows the user to know the bus location on their mobile phone. It has an advantage that it does not use any internet data for the users to be,notified...
A Study on A Hybrid Approach of Genetic Algorithm & Fuzzy To Improve Anomaly or Intrusion
Abstract: This paper describes a technique of applying Genetic Algorithm (GA) and fuzzy to network Intrusion Detection Systems (IDSs). A brief overview of a hybrid approach of genetic algorithm and fuzzy to improve anoma...
Adaptive Digital Filter Design for Linear Noise Cancellation Using Neural Networks
Abstract: Noise is the most serious issue in the filters and adaptive filters are subjected to this unwanted component. This paper deals with the problem of the adaptive noise and various adaptive algorithms functions wh...
Performance Analysis of Different Clustering Algorithm
Abstract: Clustering is the process of grouping objects into clusters such that the objects from the sameclusters are similar and objects from different clusters are dissimilar. The relationship is often expressed...