Choosing the Optimal Segmentation Level for POS Tagging of the Quranic Arabic
Journal Title: Current Journal of Applied Science and Technology - Year 2017, Vol 19, Issue 1
Abstract
As a morphologically rich language, Arabic poses special challenges to Part-of-Speech (POS) tagging. Words in Arabic texts often contain several segments; each has its own POS category. The choice of the segmentation level or the input unit, word-based or morpheme-based, is a major issue in designing any Arabic natural language processing system. In word-based approaches, words are used the atomic units of the language. In this case, composite POS tags are assigned to words. Therefore, large amounts of training data are required in order to ensure statistical significance. They suffer from the problems of data sparseness and unknown words. In case of morpheme-based approaches, morpheme components of words are used as the atomic units. This, however, results in high level of ambiguity rate and also small size of context for resolving such ambiguity because the span of the n-gram might be limited to a single word. This paper compares and contrasts the morpheme-based and word-based statistical POS tagging strategies. This paper evaluates the tagging performance of three statistical models, namely, the Arabic HMM POS tagger with the prefix guessing models, the Arabic HMM POS tagger with the linear interpolation guessing models and the TnT tagger, given training data from both morpheme-based and word-based tokenization levels. It also studies the influence of each choice on the tagging performance of the Arabic POS tagging models, in terms of the tagging accuracy and the time complexity. In addition, this paper also evaluates the tagging performance of several stochastic models, given training data from both segmentation levels. Results show that the morpheme-based POS tagging strategy is more adequate for the purpose of training statistical POS tagging models as it provides a better overall tagging accuracy and a much faster training and tagging time.
Authors and Affiliations
Fadl Mutaher Ba-Alwi, Mohammed Albared, Tareq Al-Moslmi
Community Level Vulnerability to Climate Change: A Comparative Case Study between Selected Naga Tribes in India
Aim: To assess the community level vulnerability of two dominant Naga tribes, viz. the Angami and the Ao due to climate extremes and variability. Study Design: Exploratory research design. Place and Duration of Study:...
An Analysis of the Potential, Constraints and Strategies for Development of Marirangwe Farm (A Project of the Women’s University in Africa)
Aims: The aim of the study was to conduct an analysis of the potential, constraints and strategies for development of Marirangwe Farm. Marirangwe Farm is a project of the Women’s University in Africa (WUA) in Mashonaland...
Simulation of Working No-load Induction Machine with Assigned Electrical Parameters in Adapted File MATLAB
The paper dealing with specific type of no-load induction machine are largely concerned with the conventional design. In each case, however, there is a other way on the presentation working principle and arrangement of t...
The Effect of Heat Input on the Mechanical and Corrosion Properties of AISI 304 Electric ARC Weldments
The mechanical and corrosion performances of austenitic stainless steels arc weld joints in service environments have been established to be influenced by the process parameters used in carrying out the welding process....
Microstructural Evolution of Aluminum-4043/Nickel-Coated Silicon Carbide Composites Produced via Stir Casting
Aluminum-silicon carbide (Al/SiC) metal matrix composites have become a promising engineering material owing to its high strength to weight ratio. However, there is formation of aluminum carbide phase which is deleteriou...