Classifying Arabic Text Using KNN Classifier
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2016, Vol 7, Issue 6
Abstract
With the tremendous amount of electronic documents available, there is a great need to classify documents automatically. Classification is the task of assigning objects (images, text documents, etc.) to one of several predefined categories. The selection of important terms is vital to classifier performance, feature set reduction techniques such as stop word removal, stemming and term threshold were used in this paper. Three term-selection techniques are used on a corpus of 1000 documents that fall in five categories. A comparison study is performed to find the effect of using full-word, stem, and the root term indexing methods. K-nearest – neighbors classifiers used in this study. The averages of all folds for Recall, Precision, Fallout, and Error-Rate were calculated. The results of the experiments carried out on the dataset show the importance of using k-fold testing since it presents the variations of averages of recall, precision, fallout, and error rate for each category over the 10-fold.
Authors and Affiliations
Amer Al-Badarenah, Emad Al-Shawakfa, Khaleel Al-Rababah, Safwan Shatnawi, Basel Bani-Ismail
Traceability Establishment and Visualization of Software Artefacts in DevOps Practice: A Survey
DevOps based software process has become popular with the vision of an effective collaboration between the development and operations teams that continuously integrates the frequent changes. Traceability manages the arte...
Key Issues in Vowel Based Splitting of Telugu Bigrams
Splitting of compound Telugu words into its components or root words is one of the important, tedious and yet inaccurate tasks of Natural Language Processing (NLP). Except in few special cases, at least one vowel i...
A Database Creation for Storing Electronic Documents and Research of the Staff
The research study aims at creating the database for storing Electronic Documents and Research of the staff in the Department of Educational Communications and Technology, evaluating its quality and measuring the satisfa...
Cardiotocographic Diagnosis of Fetal Health based on Multiclass Morphologic Pattern Predictions using Deep Learning Classification
Medical complications of pregnancy and pregnancy-related deaths continue to remain a major global challenge today. Internationally, about 830 maternal deaths occur every day due to pregnancy-related or childbirth-related...
Comparison Contour Extraction Based on Layered Structure and Fourier Descriptor on Image Retrieval
In this paper, a new content-based image retrieval technique using shape feature is proposed. A shape features extracted by layered structure representation has been implemented. The approach is extract feature shape by...