Combined method for scanned documents images segmentation using sequential extraction of regions

Abstract

<p>We propose a combined method to segment the images of scanned documents, which, in contrast to known methods, implies a preliminary separation of the graphics and photograph regions from the text regions and a background. In this case, an analysis of the connected components is performed, which are different for graph­ics, photographs, and text regions. In order to classify the selected regions into the photograph and graphics regions, a block method is employed. It was established that such a technique for splitting the regions into blocks less affects the quality of segmentation when compared to applying the block method directly to the original im­age. To extract the text regions that are more complex in their shape from the background, the neighborhood of each pixel was processed.</p><p>To detect the boundaries of illustrations on the images of scanned documents, we applied the bloomberg method. In order to classify into photographs and graphics, it is proposed to split an illustration into blocks of pixels. Each block of pixels is identified with a vector of two features: the mean value of the local gradient magnitude, and the mean value of the function that localizes at the images of scanned documents the linear objects (graphics and text characters). The derived feature vectors were classified using a sup­port vector machine.</p><p>When extracting the text regions, we applied a low-frequency filtering and a thresholding.</p><p>The combined method was implemented in practice to segment the test images of scanned newspaper articles from the document da­tabase mediateam at oulu university (finland). It was established that the combined method is characterized by an increase in perfor­mance speed during image segmentation at high quality processing.</p>

Authors and Affiliations

Marina Polyakova, Alesya Ishchenko, Natalya Volkova, Oleg Pavlov

Keywords

Related Articles

Stability of structural elements of special lifting mechanisms in the form of circular arches

<p>The system of differential equations of stability of circular arches with symmetric sections and the sixth-order resolving ordinary differential equation are derived. It is noted that these equations have variable coe...

Research into energy characteristics of single-phase active four-quadrant rectifiers with the improved hysteresis modulation

<p>The traction electric drives for electric rolling stock of alternating current employ the diode and thyristor rectifiers that predetermine a series of shortcomings. These include the significant emission of reactive p...

Thin­walled structures: analysis of the stressed­strained state and parameter validation

<p>The approach is developed to substantiate technical solutions for thin-walled machine building structures. It implies that the problem is considered in the space of generalized parameters. These parameters combine des...

Determining additional power losses in the electricity supply systems due to current's higher harmonics

The paper reports results of research into the influence of higher harmonics of the power source voltage and the load current on power losses in an electric network. The relevance of this study is predetermined by the ev...

Effect of backward facing step on combustion stability in a constant contact area cylindrical meso­scale combustor

<p><span lang="EN-US">This experiment investigates the effect of backward facing step size variation on combustion stability in the cylindrical meso-scale combustor with the constant contact area. The backward facing ste...

Download PDF file
  • EP ID EP528151
  • DOI 10.15587/1729-4061.2018.142735
  • Views 49
  • Downloads 0

How To Cite

Marina Polyakova, Alesya Ishchenko, Natalya Volkova, Oleg Pavlov (2018). Combined method for scanned documents images segmentation using sequential extraction of regions. Восточно-Европейский журнал передовых технологий, 5(2), 6-15. https://europub.co.uk./articles/-A-528151