Reduction of Data at Namenode in HDFS using harballing Technique  

Abstract

HDFS stands for the Hadoop Distributed File System. It has the property of handling large size files (in MB’s, GB’s or TB’s). Scientific applications adapted this HDFS/Mapreduce for large scale data analytics [1]. But major problem is small size files which are common in these applications. HDFS manages these entire small file through single Namenode server [1]-[4]. Storing and processing these small size file in HDFS is overhead to mapreduce program and also have an impact on the performance on Namenode [1]-[3]. In this paper we studied the hadoop archiving technique which will reduce the storage overhead of data on Namenode and also helps in increasing the performance by reducing the map operations in the mapreudce program. Hadoop introduces “harballing” archiving technique which will collect large number of small files in single large file. Hadoop Archive (HAR) is an effective solution to the problem of many small files. HAR packs a number of small files into large files so that the original files can be accessed in parallel transparently (without expanding the files) and efficiently. Hadoop creates the archive file by using “.har” extension. HAR increases the scalability of the system by reducing the namespace usage and decreasing the operation load in the NameNode. This improvement is orthogonal to memory optimization in NameNode and distributing namespace management across multiple NameNodes [3].  

Authors and Affiliations

Vaibhav G. Korat , Kumar Swamy Pamu

Keywords

Related Articles

Information and Communication Technologies & Woman Empowerment in India  

Information and Communication Technologies are diverse set of technical tools and resources to create, disseminate, store, brings value addition and manages information. The ICT sector consists of segments as diver...

EFFORT ESTIMATION OF SOFTWARE PROJECT  

The effort invested in a software project is probably one of the most important and most analyzed variables in recent years in the process of project management. The limitation of algorithmic effort prediction models...

PERFORMANCE ANALYSIS OF MULTICAST ROUTING PROTOCOLS IMAODV, MAODV, ODMRP AND ADMR FOR MANET 

-A Mobile Adhoc Network(MANET) is a collection of wireless mobile terminals that are able to dynamically form a temporary network without any aid from fixed infrastructure or centralized administration. Many appl...

Integration between 3G Cellular and Wireless LAN Network 

One of the prime objectives in future mobile communications is implementing seamless roaming between heterogeneous wireless networks. 3G Cellular network and Wireless LAN will complement each other to provide ubi...

SOFTWARE MODULE CLUSTERING USING SINGLE AND MULTI-OBJECTIVE APPROACHES 

Most of the interesting software systems are very large and complex which is difficult to understand their structure. Complexity occurs due to having entities that depend on each other in intricate ways in source cod...

Download PDF file
  • EP ID EP136150
  • DOI -
  • Views 106
  • Downloads 0

How To Cite

Vaibhav G. Korat, Kumar Swamy Pamu (2012). Reduction of Data at Namenode in HDFS using harballing Technique  . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(4), 635-642. https://europub.co.uk./articles/-A-136150