Missing Data Imputation using Genetic Algorithm for Supervised Learning

Abstract

Data is an important asset for any organization to successfully run its business. When we collect data, it contains data with low qualities such as noise, incomplete, missing values etc. If the quality of data is low then mining results of any data mining algorithm will also below. In this paper, we propose a technique to deal with missing values. Genetic algorithm (GA) is used for the estimation of missing values in datasets. GA is introduced to generate optimal sets of missing values and information gain (IG) is used as the fitness function to measure the performance of an individual solution. Our goal is to impute missing values in a dataset for better classification results. This technique works even better when there is a higher rate of missing values or incomplete information along with a greater number of distinct values in attributes/features having missing values. We compare our proposed technique with single imputation techniques and multiple imputations (MI) statistically based approaches on various benchmark classification techniques on different performance measures. We show that our proposed methods outperform when compare with another state of the art missing data imputation techniques.

Authors and Affiliations

Waseem Shahzad, Qamar Rehman, Ejaz Ahmed

Keywords

Related Articles

BioPay: Your Fingerprint is Your Credit Card

In recent years, credit and debit cards have become a very convenient method of payment. The growing use of card payments, hereafter referred to as credit cards, is evident in the daily use with many applications, such a...

A Novel Rule-Based Root Extraction Algorithm for Arabic Language

Non-vocalized Arabic words are ambiguous words, because non-vocalized words may have different meanings. Therefore, these words may have more than one root. Many Arabic root extraction algorithms have been conducted to e...

Unsupervised Morphological Relatedness

Assessment of the similarities between texts has been studied for decades from different perspectives and for several purposes. One interesting perspective is the morphology. This article reports the results on a study o...

PaMSA: A Parallel Algorithm for the Global Alignment of Multiple Protein Sequences

Multiple sequence alignment (MSA) is a well-known problem in bioinformatics whose main goal is the identification of evolutionary, structural or functional similarities in a set of three or more related genes or proteins...

MCIP Client Application for SCADA in Iiot Environment

Modern automation systems architectures which include several subsystems for which an adequate burden sharing is required. These subsystems must work together to fulfil the tasks imposed by the common function, given by...

Download PDF file
  • EP ID EP251109
  • DOI 10.14569/IJACSA.2017.080360
  • Views 102
  • Downloads 0

How To Cite

Waseem Shahzad, Qamar Rehman, Ejaz Ahmed (2017). Missing Data Imputation using Genetic Algorithm for Supervised Learning. International Journal of Advanced Computer Science & Applications, 8(3), 438-445. https://europub.co.uk./articles/-A-251109