PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Tonazzini A., Bedini L., Salerno E. Independent Component Analysis for Document Restoration. In: International Journal on Document Analysis and Recognition, vol. Vol. 7 n.1 (2004) pp. p. 17 - 27. Springer, 2004.
 
 
Abstract
(English)
We propose a novel approach to restore digital document images, with the aim at improving text legibility and OCR performance. These are often compromised by the presence of artifacts in the background, derived from many kinds of degradations, such as spots, underwritings, show-through or bleed-through effects. So far, background removal techniques have been based on local, adaptive filters and morphological-structural operators, to cope with frequent low contrast situations. For the specific problem of bleed-through/show-through, most work has been based on the comparison between the front and back pages. This, however, requires a preliminary registration of the two images. Our approach is based on viewing the problem as one of separating overlapped texts, and then reformulating it as a Blind Source Separation problem, approached through Independent Component Analysis techniques. These methods have the advantage that no models are required for the background. In addition, we use the spectral components of the image at different bands, so that there is no need for registration. Examples of bleed-through cancellation and recovery of underwriting from palimpsests are provided.
Subject document processing
I.4 Image Processing and Computer Vision
I.4.3 Enhancement
I.4.8 Scene Analysis. Color
I.5 Pattern Recognition
I.5.4 Applications. Text processing
I.7 Document and Text Processing
I.7.5 Document Capture. Document analysis


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional