Transforming Digital Unstructured and Semi-structured Data into Structured Data with the Aid of IE and KDT

Abstract

Data growth has seen an exponential acceleration with the advent of computer and network, which have imparted the digital form to data. Data can be classified into three categories: Unstructured data, Semi-structured data and structured data. Text Mining concerns extraction of relevant information, knowledge or patterns from sources that are in Unstructured or Semi-structured form. This project entitled “Transforming Digital Unstructured and Semi-structured Data into Structured Data with the Aid of IE and KDT” demonstrates a framework for text mining using a learned information extraction system aided with KDT (Knowledge Discovery from Text) principles. The functionality of this project is concentrated over the integrated result of IE (Information Extraction) module, KDT (Knowledge Discovery form Text) module and Standard Protocols module. Pre-processing is employed for transforming unstructured data or Semi-structured data such as HTML documents, text documents, and documents with .doc, .docx or .pdf extensions into a feasible format of data which is then mined for interesting relationships. Standard Protocols are defined for discovery of additional information’s from input sources. For Example, consider if information extraction system has managed to extract skills like “HTML” and “DHTML” from a computer job posting but could not find “XML” in the document, in such cases relationships can be mined through predefined derivations which are framed in the standard protocols module. In addition, rules mined from the database extracted from a corpus of texts are used to predict additional information that could be extracted from future input documents, thereby improving the recall of the underlying extraction system. Results are presented by applying these techniques to a corpus of computer job announcement from an internet news group.

Authors and Affiliations

Prakhyath Rai, Vijaya Murari T

Keywords

Related Articles

A Comprehensive Review of Cluster-Based Energy Efficient Routing Protocols in Wireless Sensor Networks

Wireless sensor networks(WSNs) consists of large number of multifunctional sensor nodes. Routing protocols developed for other adhoc networks cannot be applied directly in WSN because of the energy constraint of the s...

VHDL Implementation of Reed Solomon Improved Encoding Algorithm

The effective Reed Solomon encoder design is based on the improved algorithm widely used in wireless communication system. Taking RS encoder in Digital Video Broadcasting system, we introduce the structure of RS enco...

Discretion Sustaining Location-Based Provision Etiquette for K-NN Search

A large number of individuals speak with their surroundings through their companions and their proposals utilizing Mobile gadget applications, for example, four square. However without legitimate security assurance,...

A Wide Band Pattern and Frequency Reconfigurable Microstrip Patch Antenna using Varactors for WLAN Applications

Wireless communication systems are being used for a number of applications nowadays and for that, a number of antennas or a single antenna with multiple functional capabilities has become inevitable. This paper atte...

A Novel Grid Synchronization System under Adverse Grid Conditions

Grid synchronization algorithms are of great importance in the control of grid-connected power converters, as fast and accurate detection of the grid voltage parameters is crucial in order to implement stable control...

Download PDF file
  • EP ID EP28185
  • DOI -
  • Views 269
  • Downloads 2

How To Cite

Prakhyath Rai, Vijaya Murari T (2015). Transforming Digital Unstructured and Semi-structured Data into Structured Data with the Aid of IE and KDT. International Journal of Research in Computer and Communication Technology, 4(4), -. https://europub.co.uk./articles/-A-28185