Reduction of Data at Namenode in HDFS using harballing Technique  

Abstract

HDFS stands for the Hadoop Distributed File System. It has the property of handling large size files (in MB’s, GB’s or TB’s). Scientific applications adapted this HDFS/Mapreduce for large scale data analytics [1]. But major problem is small size files which are common in these applications. HDFS manages these entire small file through single Namenode server [1]-[4]. Storing and processing these small size file in HDFS is overhead to mapreduce program and also have an impact on the performance on Namenode [1]-[3]. In this paper we studied the hadoop archiving technique which will reduce the storage overhead of data on Namenode and also helps in increasing the performance by reducing the map operations in the mapreudce program. Hadoop introduces “harballing” archiving technique which will collect large number of small files in single large file. Hadoop Archive (HAR) is an effective solution to the problem of many small files. HAR packs a number of small files into large files so that the original files can be accessed in parallel transparently (without expanding the files) and efficiently. Hadoop creates the archive file by using “.har” extension. HAR increases the scalability of the system by reducing the namespace usage and decreasing the operation load in the NameNode. This improvement is orthogonal to memory optimization in NameNode and distributing namespace management across multiple NameNodes [3].  

Authors and Affiliations

Vaibhav G. Korat , Kumar Swamy Pamu

Keywords

Related Articles

Query Optimization: Finding the Optimal Execution Strategy 

The query optimizer is the component of a database management system that attempts to determine the most efficient way to execute a query. The optimizer considers the possible query plans for a given input query, a...

A Comparative based study of Different Video-Shot Boundary Detection algorithms 

A shot in a digital video sequence may be defined as a set of images (frames) from a single camera. A shot boundary is determined when one shot changes to another shot. A scene is a collection of one or more shots...

A New Optimization Method for Dynamic Travelling Salesman Problem with Hybrid Ant Colony Optimization Algorithm and Particle Swarm Optimization 

In recent decades, with the introduction of optimization problems, new methods of was optimizing developed. The most important group of optimization techniques is meta-heuristic method. That is able to solve the...

HOST SELECTION METHODOLOGY IN CLOUD COMPUTING ENVIRONMENT 

Cloud computing is a paradigm in which IT (information technology) application provide as a service. It allows users to utilize on-demand computation over internet, which is helpful for storage of data and servic...

Design of Multi-Channel UART Controller Based On FIFO and FPGA  

This paper presents a multi-channel UART controller based on FPGA (Field Programmable Gate Array). UART a kind of serial communication circuit is used widely. A universal asynchronous receive/transmit (UART) is an...

Download PDF file
  • EP ID EP136150
  • DOI -
  • Views 128
  • Downloads 0

How To Cite

Vaibhav G. Korat, Kumar Swamy Pamu (2012). Reduction of Data at Namenode in HDFS using harballing Technique  . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(4), 635-642. https://europub.co.uk./articles/-A-136150