Efficient Processing of XML Documents in Hadoop Map Reduce

Journal Title: International Journal on Computer Science and Engineering - Year 2014, Vol 6, Issue 9

Abstract

XML has dominated the enterprise landscape for fifteen years and still remains the most commonly used data format. Despite its popularity the usage of XML for "Big Data" is challenging due to its semi-structured nature as well as rather demanding memory requirements and lack of support for some complex data structures such as maps. While a number of tools and technologies for processing XML are readily available the common approach for map-reduce environments is to create a "custom solution" that is based, for example, on Apache Hive User Defined Functions (UDF). As XML processing is the common use case, this paper describes a generic approach to handling XML based on Apache Hive architecture. The described functionality complements the existing family of Hive serializers/deserializers for other popular data formats, such as JSON, and makes it much easier for users to deal with the large amount of data in XML format.

Authors and Affiliations

Dmitry Vasilenko , Mahesh Kurapati

Keywords

Related Articles

PDPKDES ( Pre Distributed Powerful Key for Data Encryption Standard)

Data encryption standard in spite of being a great algorithm in terms of a good combination of confusion and diffusion steps, doesn’t largely used because of a weak key concept. The key used in DES is only 64 bits (or 56...

S-boxes generated using Affine Transformation giving Maximum Avalanche Effect

The Advanced Encryption Standard (AES) was published by National Institute of Standards and Technology (NIST) in November 2001, to replace DES (Data Encryption Standard) and Triple DES. The S-box (Substitution box) used...

A Bit Level Session Based Encryption Technique to Enhance Information Security

In this paper, a session based symmetric key cryptographic system has been proposed and it is termed as Bit Shuffle Technique (BST). This proposed technique is very fast, suitable and secure for encryption of large files...

Global Intelligent Sensor Networking System

New media, new data type and ubiquitous access to high speed real time network applications are revolutionizing the whole Universe. In many critical applications, human performance is often a high risk element in the ove...

PRODUCTIVE CO PROCESSOR DESIGN BASED ON PROGRAM BENCHMARK

The objective of this paper is to design a methodology where many co-processors are accessed by the processor in array mode. By using co processor, the work on the multi core processor gets reduced by accessing it in arr...

Download PDF file
  • EP ID EP94484
  • DOI -
  • Views 120
  • Downloads 0

How To Cite

Dmitry Vasilenko, Mahesh Kurapati (2014). Efficient Processing of XML Documents in Hadoop Map Reduce. International Journal on Computer Science and Engineering, 6(9), 329-333. https://europub.co.uk./articles/-A-94484