A novel semantic level text classification by combining NLP and Thesaurus concepts

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 4

Abstract

 Abstract: Text categorization (also known as text classification or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. Automated text classification is attractive because it frees organizations from the need of manually organizing document bases, but it can be too expensive or simply not feasible given the time constraints of the application or the number of documents involved. In the previous approaches only the Wikipedia concepts related to terms in syntactic level are used to represent document in semantic level. This paper proposes a new approach to represent semantic level with the use of Word Net. The semantic weight of terms related to the concepts from Wikipedia and Word Net are used to represent semantic information. The semantic vector space model of terms by combining the Word Net and Wikipedia is being further improved the classification accuracy of the Text classification. Because of, two different concept extractor are gives the concepts related to the terms in the syntactic level o find the better concept vector space for documents. So we obtain the improved classification by using this approach. In this study the classification framework are presented. In classification framework, the primary information is effectively kept and the noise is reduced by compressing the original information, so that this framework can guarantee the quality of the input of all classifiers. This proposed method can help to further improve the performance of classification framework by introducing Wikipedia with Word Net. We find that the proposed approach result in a high classification accuracy.

Authors and Affiliations

R. Nagaraj , Dr. V. Thiagarasu , P. Vijayakumar

Keywords

Related Articles

 Using Ensemble Methods for Improving Classification of the KDD CUP ’99 Data Set

Abstract: The KDD CUP ’99 data set has been widely used for intrusion detection and pattern mining in the last decade or so. Umpteen number of experiments pertaining to classification have been conducted on it.Many resea...

 More General Sophisticated Method of Implementation of Fiber to the Homes

 Fiber to the Homes (FTTH) is one of the most important fiber optic applications, since FTTH provides huge bandwidth. The single fiber offering multi services such as :( Data, Voice, Video etc.).Comparing FTTH and c...

 Next Generation Genetic Algorithm for Maximum Clique Problem

 Abstract: Given a graph, in the maximum clique problem, one desires to find the largest number of vertices, any two of which are adjacent. A branch-and-bound algorithm for the maximum clique problem—which is comput...

 Big Data: The Future of Data Storage

 Abstract: According to Internet World statistics, todayInternet has 1.7 Billion users, compared with the population of 6.7 billion people.Around 40% of the world population is connected via internet across the gl...

 A Method of View Materialization Using Genetic Algorithm

A data warehouse is a very large database system that collects, summarizes and stores data from multiple remote and heterogeneous information sources. Data warehouses are used for supporting decision making operation on...

Download PDF file
  • EP ID EP94441
  • DOI 10.9790/0661-16461426
  • Views 110
  • Downloads 0

How To Cite

R. Nagaraj, Dr. V. Thiagarasu, P. Vijayakumar (2014).  A novel semantic level text classification by combining NLP and Thesaurus concepts. IOSR Journals (IOSR Journal of Computer Engineering), 16(4), 14-26. https://europub.co.uk./articles/-A-94441