Performance Analysis of Classification Learning Methods on Large Dataset using two Data Mining Tools
Journal Title: Journal of Independent Studies and Research - Computing - Year 2015, Vol 13, Issue 2
Abstract
Data is increasing day to day thus, processing this data and selection of right method and tool is really a big problem. Computer scientists are process- ing and analysing data on different machine learning methods using various Data Mining tools to get the high accuracy of results and minimum time for building of Model. There are several data analysis and processing tools like WEKA, RapidMiner, Keel, and etc. available for the purpose of processing, analysis, modelling and etc. Still no single tool is perfect or nominated for data processing and analysis. In this concern, the authors present here a comparative and analytical research study on the performance of different classification machine learning algorithms like Naïve Bayes, KNN, IBK, Random Forest, C4.5, J48 and Data Mining tools which are WEKA and RapidMiner on a large datasets to evalu- ate their performance and analytical results with low cost of error. The data set Adult Income is taken from UCI Data repository for this research study. The significance and aim of this study is to evaluate and assess the range of performance of different machine learning methods and two diverse data mining tools on dissimilar datasets. The result of each classification method and Data mining tool is analysed and presented in the end.
Extracting Key Sentences from Text
Automatic key sentence extraction from a text is a challenging task. It has numerous applications in text processing systems. The actual task of key sentence extraction consists of three main functionalities: (i) Identif...
Comparative Analysis of Collaborative Filtering on GraphLab, MLlib and Mahout
Recommendation systems are used to recommend items or products to the user based on their previous purchases, visits, interests, ratings, wish-lists or reviews to develop interest and to display the accurate and suitable...
An Investigation on Topic Maps Based Document Classification with Unbalance Classes
Classification of imbalanced data has become a widespread problem due to the fact that the most real world datasets are imbalanced. In a classification task, one of the challenges is to learn the feature-space of classif...
Urdu Optical Character Recognition Technique for Jameel Noori Nastaleeq Script
Urdu OCR's have been an object of interest for many developers in the recent years. Active research is being done pertaining to Urdu OCR’s, but because of the complexity associated with Urdu fonts; it still lacks perfect...
Comprehensive Study of Textual Processing and Proposed Automatic Essay Evaluation System
From last 50 years the work has been conducted on building such systems that can have capabilities by which it can evaluate or check like a human tutor or even better than a human tutor, this is the goal of Automatic Ess...