Script Identification for printed document images at text-line level using DCT and PCA
Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2013, Vol 12, Issue 5
Abstract
The progress of information technology and the wide reach of the Internet are drastically changing all fields of activity in modern days. As a result, a very large number of people would be required to interact more frequently with computer systems. To develop the human–machine interaction more effective in such situations, it is desirable to have systems capable of handling inputs in a variety of forms such as printed/handwritten paper documents. In a multi-lingual country like India, where more than 22 official languages and 12 different scripts are used for these languages. it is an utmost essential & complicated for designing an OCR system and it became more difficult if the document consist of multiple languages so for an automated multilingual environment such document processing systems relying on OCR would clearly need to identify the script type of the document files, so that specific tool of OCR can be selected. In this paper, a script identification approach for Indian scripts is proposed at text-line level. It is a Visual appearance-based script recognition method. The recognition is based upon features extracted using Discrete Cosine Transform (DCT) and Principal Component Analysis (PCA) algorithm and for further extraction we use Modified-KNN. The proposed method is tested on printed document images in 11 major Indian languages, 95% recognition accuracy is obtained.
Authors and Affiliations
Monali Jindal
Informational Retrieval Using Crawler & Protecting Social Networking Data from Information leakage
Abstract: Online social networks, such as Facebook, Twitter, Yahoo!, Google+ are utilized by many people. These networks allow users to publish details about themselves and to connect to their friends. Some of the...
Towards Web 3.0: An Application Oriented Approach
The World Wide Web (WWW) is global information medium, where users can read and write using computers over internet. Web is one of the services available on internet. The Web was created in 1989 by Sir Tim...
PAPR Reduction in OFDM Model
Abstract: Multipath reflects the signal with different phases. The best solution of multipath fading is OFDM.Orthogonal frequency division multiplexing (OFDM) has been focused on in high data rate wireless communic...
Big Data: The Future of Data Storage
Abstract: According to Internet World statistics, todayInternet has 1.7 Billion users, compared with the population of 6.7 billion people.Around 40% of the world population is connected via internet across the gl...
Design and Developing a Multicast Routing Protocol for Link Failure and Reliable Data Delivery.
MANET is a mobile Ad hoc network. It is a wireless and self organized network without infrastructure support. Ad hoc networks systems possess rapid deployment, robustness and flexibility. The problems of Ad ho...