Identifying Cancer Biomarkers Via Node Classification within a Mapreduce Framework

Abstract

Big data are giving new research challenges in the life sciences domain because of their variety, volume, veracity, velocity, and value. Predicting gene biomarkers is one of the vital research issues in bioinformatics field, where microarray gene expression and network based methods can be used. These datasets suffer from the huge data voluminous, causing main memory problems. In this paper, a Random Committee Node Classifier algorithm (RCNC) is proposed for identifying cancer biomarkers, which is based on microarray gene expression data and Protein-Protein Interaction (PPI) data. Data are enriched from other public databases, such as IntACT1 and UniProt2 and Gene Ontology3 (GO). Cancer Biomarkers are identified when applied to different datasets with an accuracy rate an accuracy rate 99.16%, 99.96% precision, 99.24% recall, 99.16% F1-measure and 99.6 ROC. To speed up the performance, it is run within a MapReduce framework, where RCNC MapReduce algorithm is much faster than RCNC sequential algorithm when having large datasets.

Authors and Affiliations

Taysir Soliman

Keywords

Related Articles

Implication of Genetic Algorithm in Cryptography to Enhance Security

In today’s age of information technology secure transmission of information is a big challenge. Symmetric and asymmetric cryptosystems are not appropriate for high level of security. Modern hash function based systems ar...

A Novel Technique for Glitch and Leakage Power Reduction in CMOS VLSI Circuits

Leakage power has become a serious concern in nanometer CMOS technologies. Dynamic and leakage power both are the main contributors to the total power consumption. In the past, the dynamic power has dominated the total p...

Validating Antecedents of Customer Engagement in Social Networking Sites using Fuzzy Delphi Analysis

The concept of online customer engagement is getting imperative in modern business due to the uncontrolled conversation via cyber-avenue. This study validates the antecedents of customer engagement conceptualized in Soci...

Data Fusion Between Microwave and Thermal Infrared Radiometer Data and Its Application to Skin Sea Surface Temperature, Wind Speed and Salinity Retrievals

Method for data fusion between Microwave Scanning Radiometer: MSR and Thermal Infrared Radiometer: TIR derived skin sea surface temperature: SSST, wind speed: WS and salinity is proposed. SSST can be estimated with MSR a...

Downlink and Uplink Message Size Impact on Round Trip Time Metric in Multi-Hop Wireless Mesh Networks

In this paper, the authors propose a novel real-time study metrics of Round Trip Time (RTT) for Multi-Hop Wireless Mesh Networks. They focus on real operational wireless networks with fixed nodes, such as industrial wire...

Download PDF file
  • EP ID EP101255
  • DOI 10.14569/IJACSA.2015.061225
  • Views 91
  • Downloads 0

How To Cite

Taysir Soliman (2015). Identifying Cancer Biomarkers Via Node Classification within a Mapreduce Framework. International Journal of Advanced Computer Science & Applications, 6(12), 184-189. https://europub.co.uk./articles/-A-101255