Convolutional Neural Networks in Predicting Missing Text in Arabic
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2019, Vol 10, Issue 6
Abstract
Missing text prediction is one of the major concerns of Natural Language Processing deep learning community’s at-tention. However, the majority of text prediction related research is performed in other languages but not Arabic. In this paper, we take a first step in training a deep learning language model on Arabic language. Our contribution is the prediction of missing text from text documents while applying Convolutional Neural Networks (CNN) on Arabic Language Models. We have built CNN-based Language Models responding to specific settings in relation with Arabic language. We have prepared our dataset of a large quantity of text documents freely downloaded from Arab World Books, Hindawi foundation, and Shamela datasets. To calculate the accuracy of prediction, we have compared documents with complete text and same documents with missing text. We realized training, validation and test steps at three different stages aiming to increase the performance of prediction. The model had been trained at first stage on documents of the same author, then at the second stage, it had been trained on documents of the same dataset, and finally, at the third stage, the model had been trained on all document confused. Steps of training, validation and test have been repeated many times by changing each time the author, dataset, and the combination author-dataset, respectively. Also we have used the technique of enlarging training data by feeding the CNN-model each time by a larger quantity of text. The model gave a high performance of Arabic text prediction using Convolutional Neural Networks with an accuracy that have reached 97.8% in best case.
Authors and Affiliations
Adnan Souri, Mohamed Alachhab, Badr Eddine Elmohajir, Abdelali Zbakh
A Recent Study on Routing Protocols in UWSNs
Recent research has seen remarkable advancement in the field of Under Water Sensor Networks (UWSNs). Many different protocols are developed in the recent years in this domain. As these protocols can be categorized in a v...
Tsunami Warning System with Sea Surface Features Derived from Altimeter Onboard Satellites
A tsunami warning system based on active database system with satellite derived real-time data of tidal, significant wave height and ocean wind speed as well as assimilation data of sea level changes as one of the global...
Segmentation of Ultrasound Breast Images using Vector Neighborhood with Vector Sequencing on KMCG and augmented KMCG algorithms
B mode ultrasound (US) imaging is popular and important modality to examine the range of clinical problems and also used as complimentary to the mammogram imaging to detect and diagnose the nature breast tumor. To unders...
AnyCasting In Dual Sink Approach (ACIDS) for WBASNs
After successful development in health-care services, WBASN is also being used in other fields where continuous and distant health-care monitoring is required. Various suggested protocols presented in literature work to...
A Type-2 Fuzzy in Image Extraction for DICOM Image
Eradication of a desired portion of an image is a very important role in image processing and is also called feature extraction. This is mainly concern about reducing the number of possessions required to portray a large...