Classifying Arabic Text Using KNN Classifier

Abstract

With the tremendous amount of electronic documents available, there is a great need to classify documents automatically. Classification is the task of assigning objects (images, text documents, etc.) to one of several predefined categories. The selection of important terms is vital to classifier performance, feature set reduction techniques such as stop word removal, stemming and term threshold were used in this paper. Three term-selection techniques are used on a corpus of 1000 documents that fall in five categories. A comparison study is performed to find the effect of using full-word, stem, and the root term indexing methods. K-nearest – neighbors classifiers used in this study. The averages of all folds for Recall, Precision, Fallout, and Error-Rate were calculated. The results of the experiments carried out on the dataset show the importance of using k-fold testing since it presents the variations of averages of recall, precision, fallout, and error rate for each category over the 10-fold.

Authors and Affiliations

Amer Al-Badarenah, Emad Al-Shawakfa, Khaleel Al-Rababah, Safwan Shatnawi, Basel Bani-Ismail

Keywords

Related Articles

Traceability Establishment and Visualization of Software Artefacts in DevOps Practice: A Survey

DevOps based software process has become popular with the vision of an effective collaboration between the development and operations teams that continuously integrates the frequent changes. Traceability manages the arte...

 Key Issues in Vowel Based Splitting of Telugu Bigrams

 Splitting of compound Telugu words into its components or root words is one of the important, tedious and yet inaccurate tasks of Natural Language Processing (NLP). Except in few special cases, at least one vowel i...

A Database Creation for Storing Electronic Documents and Research of the Staff

The research study aims at creating the database for storing Electronic Documents and Research of the staff in the Department of Educational Communications and Technology, evaluating its quality and measuring the satisfa...

Cardiotocographic Diagnosis of Fetal Health based on Multiclass Morphologic Pattern Predictions using Deep Learning Classification

Medical complications of pregnancy and pregnancy-related deaths continue to remain a major global challenge today. Internationally, about 830 maternal deaths occur every day due to pregnancy-related or childbirth-related...

Comparison Contour Extraction Based on Layered Structure and Fourier Descriptor on Image Retrieval

In this paper, a new content-based image retrieval technique using shape feature is proposed. A shape features extracted by layered structure representation has been implemented. The approach is extract feature shape by...

Download PDF file
  • EP ID EP154286
  • DOI 10.14569/IJACSA.2016.070633
  • Views 114
  • Downloads 0

How To Cite

Amer Al-Badarenah, Emad Al-Shawakfa, Khaleel Al-Rababah, Safwan Shatnawi, Basel Bani-Ismail (2016). Classifying Arabic Text Using KNN Classifier. International Journal of Advanced Computer Science & Applications, 7(6), 259-268. https://europub.co.uk./articles/-A-154286