Importance of Text Data Preprocessing & Implementation in RapidMiner
Journal Title: Annals of Computer Science and Information Systems - Year 2018, Vol 14, Issue
Abstract
Data preparation is an important phase before applying any machine learning algorithms. Same with the text data before applying any machine learning algorithm on text data, it requires data preparation. The data preparation is done by data preprocessing. The preprocessing of text means cleaning of noise such as: cleaning of stop words, punctuation, terms which doesn't carry much weightage in context to the text, etc. In this paper, we describe in detail how to prepare data for machine learning algorithms using RapidMiner tool. This preprocessing is followed by conversion of bag of words into term vector model and describe about the various algorithms which can be applied in RapidMiner for data analysis and predictive modeling. We also discussed about the challenges and applications of text mining in recent days
Authors and Affiliations
Vaishali Kalra, Rashmi Aggarwal
Financial Inclusion in India and PMJDY: A Critical Review
The recent developments in banking and insurance have transformed the financial system, however, it is restricted only to certain segments of the society, excluding others. i.e. ``financial exclusion''. People with low i...
Reliability Modeling of OSS Systems based on Innovation-Diffusion Theory and Imperfect Debugging
Open Source Software (OSS) has obtained widespread popularity in last few decades due to the exceptional contribution of some well established ones like Apache, Android, MySQL, LibreOffice, Linux etc. not only in the fie...
Towards a Supportive City with Smart Urban Objects in the Internet of Things: The Case of Adaptive Park Bench and Adaptive Light
Internet of things technology is a key driver to build smart city infrastructure. The potentials for urban management problems which require process control and allocation mechanisms has long been acknowledged. However,...
Testing the Algorithm of Area Optimization by Binary Classification with Use of Three State 2D Cellular Automata in Layers
The paper is dedicated to a new algorithm of optimization in the sense of the area. Proposed method joins a few issues. First one is utilizing data from the set of sensors monitoring the area put into optimization. The s...
Design of models for the tokenization of electric power industry basing on the blockchain technology
The problem of implementing modern technologies into the electric power industry is quite relevant in the world. The article considers the models of decentralized platforms providing services for energy distribution and...