Feature-based Similarity Method for Aligning the Malay and English News Document

Journal Title: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY - Year 2013, Vol 11, Issue 4

Abstract

Corpus-based translation approach can be used to obtain reliable translation knowledge in addition to the use of dictionaries or machine translation. But the availability of such corpus is very limited especially for the low-resources languages. Many works have been reported for the alignments of multilingual documents especially among the European languages, but less focusing on the languages with less linguistics resources. One of the challenges is to align the available multilingual documents for the creation of comparable corpus for these kinds of languages. This article describes an alignment method that utilized the statistical features of the documents such as the documents’ titles, texts of the contents, and also the named entities present in each document. This method will be focusing on the English and Malay news documents, in which in which the Malay language is considered as a low-resource language. Source and target documents were then compared in a pair. Accuracy, precision, and recall measurements were used in evaluating the results with the inclusion of three relevance scales; Same story, Shared aspect and Unrelated, to assess the alignment pairs. The results indicate that the method performed well in aligning the news documents with the accuracy of 96% and average precision of 81%.

Authors and Affiliations

Nurul Amelina Nasharuddin, Muhamad Taufik Abdullah, Azreen Azman, Rabiah Abdul Kadir, Enrique Herrera-Viedma

Keywords

Related Articles

A Reference Model and a vision for manufacturing system for 2030

The manufacturing enterprises are now experiencing high pressure of competition. In addition, the advancement in computer software, hardware, networks, information technologies and integration has been gradually reshapin...

Classical Encryption Techniques

This paper reviews some of the classical encryption and modern techniques which are widely used to solve the problem in open networked systems, where information is being received and misused by adversaries by means of f...

A Content Based Approach to Medical X-Ray Image Retrieval using Texture Features

Of late, the amount of digital X-ray images that are produced in hospitals is increasing incredibly fast. Efficient storing, processing and retrieving of X-ray images have thus become an important research topic. With th...

EFFICIENT TECHNIQUE FOR AUTOMATIC EXTRACTION AND IDENTIFICATION OF ELEVATION FROM A REFERENCE IMAGE

In Geographic information system (GIS) identification of elevation detail pertaining to a contour in a reference map or topological map plays an important role while creating digital elevation model or digital surface mo...

Method of Moving Region Detection for Static Camera

The moving object detection from a stationary video sequence is a primary task in various computer vision applications. In this proposed system; three processing levels are suppose to perform: detects moving objects regi...

Download PDF file
  • EP ID EP650311
  • DOI 10.24297/ijct.v11i4.3125
  • Views 94
  • Downloads 0

How To Cite

Nurul Amelina Nasharuddin, Muhamad Taufik Abdullah, Azreen Azman, Rabiah Abdul Kadir, Enrique Herrera-Viedma (2013). Feature-based Similarity Method for Aligning the Malay and English News Document. INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY, 11(4), 2410-2421. https://europub.co.uk./articles/-A-650311