Importance of Text Data Preprocessing & Implementation in RapidMiner

Journal Title: Annals of Computer Science and Information Systems - Year 2018, Vol 14, Issue

Abstract

Data preparation is an important phase before applying any machine learning algorithms. Same with the text data before applying any machine learning algorithm on text data, it requires data preparation. The data preparation is done by data preprocessing. The preprocessing of text means cleaning of noise such as: cleaning of stop words, punctuation, terms which doesn't carry much weightage in context to the text, etc. In this paper, we describe in detail how to prepare data for machine learning algorithms using RapidMiner tool. This preprocessing is followed by conversion of bag of words into term vector model and describe about the various algorithms which can be applied in RapidMiner for data analysis and predictive modeling. We also discussed about the challenges and applications of text mining in recent days

Authors and Affiliations

Vaishali Kalra, Rashmi Aggarwal

Keywords

Related Articles

Financial Inclusion in India and PMJDY: A Critical Review

The recent developments in banking and insurance have transformed the financial system, however, it is restricted only to certain segments of the society, excluding others. i.e. ``financial exclusion''. People with low i...

Reliability Modeling of OSS Systems based on Innovation-Diffusion Theory and Imperfect Debugging

Open Source Software (OSS) has obtained widespread popularity in last few decades due to the exceptional contribution of some well established ones like Apache, Android, MySQL, LibreOffice, Linux etc. not only in the fie...

Towards a Supportive City with Smart Urban Objects in the Internet of Things: The Case of Adaptive Park Bench and Adaptive Light

Internet of things technology is a key driver to build smart city infrastructure. The potentials for urban management problems which require process control and allocation mechanisms has long been acknowledged. However,...

Testing the Algorithm of Area Optimization by Binary Classification with Use of Three State 2D Cellular Automata in Layers

The paper is dedicated to a new algorithm of optimization in the sense of the area. Proposed method joins a few issues. First one is utilizing data from the set of sensors monitoring the area put into optimization. The s...

Design of models for the tokenization of electric power industry basing on the blockchain technology

The problem of implementing modern technologies into the electric power industry is quite relevant in the world. The article considers the models of decentralized platforms providing services for energy distribution and...

Download PDF file
  • EP ID EP569711
  • DOI 10.15439/2017KM46
  • Views 28
  • Downloads 0

How To Cite

Vaishali Kalra, Rashmi Aggarwal (2018). Importance of Text Data Preprocessing & Implementation in RapidMiner. Annals of Computer Science and Information Systems, 14(), 71-75. https://europub.co.uk./articles/-A-569711