A novel semantic level text classification by combining NLP and Thesaurus concepts

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 4

Abstract

 Abstract: Text categorization (also known as text classification or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. Automated text classification is attractive because it frees organizations from the need of manually organizing document bases, but it can be too expensive or simply not feasible given the time constraints of the application or the number of documents involved. In the previous approaches only the Wikipedia concepts related to terms in syntactic level are used to represent document in semantic level. This paper proposes a new approach to represent semantic level with the use of Word Net. The semantic weight of terms related to the concepts from Wikipedia and Word Net are used to represent semantic information. The semantic vector space model of terms by combining the Word Net and Wikipedia is being further improved the classification accuracy of the Text classification. Because of, two different concept extractor are gives the concepts related to the terms in the syntactic level o find the better concept vector space for documents. So we obtain the improved classification by using this approach. In this study the classification framework are presented. In classification framework, the primary information is effectively kept and the noise is reduced by compressing the original information, so that this framework can guarantee the quality of the input of all classifiers. This proposed method can help to further improve the performance of classification framework by introducing Wikipedia with Word Net. We find that the proposed approach result in a high classification accuracy.

Authors and Affiliations

R. Nagaraj , Dr. V. Thiagarasu , P. Vijayakumar

Keywords

Related Articles

 Retinal Vessels Segmentation Using Supervised Classifiers for  Identification of Cardio Vascular Diseases

 The risk of cardio vascular diseases can be identified by measuring the retinal blood vessel. The identification of wrong blood vessel may result in wrong clinical diagnosis. This proposed system addresses the &n...

Optimizing Neuro-Fuzzy Fault Diagnostic Algorithm for Photovoltaic Systems

Abstract: The main goal of this research is to develop a novel optimum neuro-fuzzy system for diagnosis the complex and dynamic systems. .It has used the Particle Swarm Optimization (PSO) technique for training the Adapt...

Skill Acquisition: An E-Learning Approach

The attention of the Nigerian Government for over twenty years now has been on youth’s empowerment through skill acquisition for self-realisation and economic independence. The Government and individuals, both in the pas...

 How Are Mainframes Still Better Than Cloud?

 Abstract: This article presents a summary of the Mainframes which are breathing new life in business withinthe wake of a new computing commonly termed as Cloud Computing in which colossal clusters of remoteservers...

 Workflow Scheduling for Public Cloud Using Genetic Algorithm (WSGA

  Workflow scheduling is a challenging issue in Cloud Computing. Though there are popularschedulers available for workflow scheduling in Grid and other distributed environments, they are notapplicable to Cloud. Clou...

Download PDF file
  • EP ID EP94441
  • DOI 10.9790/0661-16461426
  • Views 127
  • Downloads 0

How To Cite

R. Nagaraj, Dr. V. Thiagarasu, P. Vijayakumar (2014).  A novel semantic level text classification by combining NLP and Thesaurus concepts. IOSR Journals (IOSR Journal of Computer Engineering), 16(4), 14-26. https://europub.co.uk./articles/-A-94441