Feature-based Similarity Method for Aligning the Malay and English News Document

Journal Title: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY - Year 2013, Vol 11, Issue 4

Abstract

Corpus-based translation approach can be used to obtain reliable translation knowledge in addition to the use of dictionaries or machine translation. But the availability of such corpus is very limited especially for the low-resources languages. Many works have been reported for the alignments of multilingual documents especially among the European languages, but less focusing on the languages with less linguistics resources. One of the challenges is to align the available multilingual documents for the creation of comparable corpus for these kinds of languages. This article describes an alignment method that utilized the statistical features of the documents such as the documents’ titles, texts of the contents, and also the named entities present in each document. This method will be focusing on the English and Malay news documents, in which in which the Malay language is considered as a low-resource language. Source and target documents were then compared in a pair. Accuracy, precision, and recall measurements were used in evaluating the results with the inclusion of three relevance scales; Same story, Shared aspect and Unrelated, to assess the alignment pairs. The results indicate that the method performed well in aligning the news documents with the accuracy of 96% and average precision of 81%.

Authors and Affiliations

Nurul Amelina Nasharuddin, Muhamad Taufik Abdullah, Azreen Azman, Rabiah Abdul Kadir, Enrique Herrera-Viedma

Keywords

Related Articles

COORDINATED DISTRIBUTED SCHEDULING IN WIRELESS MESH NETWORK

IEEE 802.16 based wireless mesh networks (WMNs) are a promising broadband access solution to support flexibility, cost effectiveness and fast deployment of the fourth generation infrastructure based wireless networks. Re...

Integration of GPS with Digital devices and Interactive objects for public safety.

Abstract:             The paper demonstrated the role of GPS technology in law enforcement and public safety application areas. It presents the design, implem...

A Spatial Domain Approach of Fingerprinting for Colored Digital Images

In this paper, a spatial domain approach of fingerprinting is presented for colored digital images. A semi-blind fingerprinting conveys a secure arrangement for trading of digital images. The operational significance of...

Imprementation of Sochastic Searching for Complex Processes Identification

This paper presents implementation of stochastic searching method for complex technological process identification. As the first step of control systems design is identification of a process mathematcal model. The author...

Investigation of Mobility Model Against Reactive Routing Protocols in MANETs

Ad-hoc network is a collection of wireless mobile nodes which dynamically form a temporary network without the use of any existing network infrastructure or centralized administration. It may connect hundreds to thousand...

Download PDF file
  • EP ID EP650311
  • DOI 10.24297/ijct.v11i4.3125
  • Views 84
  • Downloads 0

How To Cite

Nurul Amelina Nasharuddin, Muhamad Taufik Abdullah, Azreen Azman, Rabiah Abdul Kadir, Enrique Herrera-Viedma (2013). Feature-based Similarity Method for Aligning the Malay and English News Document. INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY, 11(4), 2410-2421. https://europub.co.uk./articles/-A-650311